Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recherche via Elasticsearch #4096

Merged
merged 110 commits into from
Feb 5, 2017
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
110 commits
Select commit Hold shift + click to select a range
afbd1c0
add elastisearch modules and define base models
pierre-24 Dec 23, 2016
221b5ff
basic indexing with and without flags
pierre-24 Dec 23, 2016
cb01421
add migration for post
pierre-24 Dec 24, 2016
e61ffc1
check if pk and id match
pierre-24 Dec 24, 2016
280a4fe
Include forum into search
pierre-24 Dec 24, 2016
682010f
Merry christmas, @vhf and @artragis
pierre-24 Dec 25, 2016
c68752b
Attempt to index published contents and chapters (with some tricks)
pierre-24 Dec 27, 2016
f72b30f
use a slightly different trick for chapters
pierre-24 Dec 27, 2016
eb8aaed
Index text directly with PublishedContent if article or mini-tuto
pierre-24 Dec 27, 2016
fcc1cfa
add custom analyzer
pierre-24 Dec 27, 2016
a6f341c
Improve analyzer for programming language
pierre-24 Dec 28, 2016
987a7cf
argparse is magic, thx @artragis !
pierre-24 Dec 28, 2016
f7439ae
asciifolding is a wrong choice after all
pierre-24 Dec 28, 2016
d7addd3
implementation of basic search (no boosting, no type selection, no hi…
pierre-24 Dec 29, 2016
ddd2e72
elif instead of if
pierre-24 Dec 29, 2016
0e41258
weighting the different models
pierre-24 Dec 29, 2016
2a8b603
improve index creation (set mapping at the same time)
pierre-24 Dec 29, 2016
40bee12
Improve stuffs:
pierre-24 Dec 30, 2016
a278ef5
smaller is better (thx again, @artragis)
pierre-24 Dec 30, 2016
1c73993
to be consistant, use text_html for posts
pierre-24 Dec 30, 2016
553fee8
fix templatetags to prevent errors comming from fixtures
pierre-24 Dec 30, 2016
282ff34
attempt highlighting with ES
pierre-24 Dec 30, 2016
1f32be3
dont use order=score if continuous text
pierre-24 Dec 30, 2016
c665752
boost is now also performed based on the other criterions
pierre-24 Dec 30, 2016
437f913
Implement correct deletion (with weirds relationships for published c…
pierre-24 Dec 30, 2016
8a13142
invisible posts must still be indexed but wiped out from the query
pierre-24 Dec 30, 2016
dc96397
form is not invalid
pierre-24 Dec 30, 2016
0240817
correct way to connect, and prevent work when there is no connection
pierre-24 Dec 30, 2016
f166f51
shut up, Travis
pierre-24 Dec 30, 2016
bd062bf
prevent research page from failing while ES is not there
pierre-24 Dec 31, 2016
2b99fdc
more control over index creation through settings
pierre-24 Jan 2, 2017
1d7c0a6
use a custom tokeinzer to get rid of all the special characters inclu…
pierre-24 Jan 2, 2017
2b92ec7
Supprime le CSS de la recherche de l'ancienne page d'accueil (pre-ZEP 4)
Situphen Dec 29, 2016
4a25510
Utilise la barre de recherche de la page d'accueil pour la page de re…
Situphen Dec 29, 2016
0b3f61f
Improve form:
pierre-24 Jan 2, 2017
437c961
protect LaTeX from stemming as well
pierre-24 Jan 2, 2017
669954c
Logging instead of printing
pierre-24 Jan 2, 2017
32ba8af
Remove haystack and use the brand new search
pierre-24 Jan 2, 2017
d6f4cbe
yet another term to protect
pierre-24 Jan 2, 2017
6104d9f
add documentation
pierre-24 Jan 3, 2017
83d7224
update to version 5 of elasticsearch-py and use true delete_by_query()
pierre-24 Jan 3, 2017
83e662c
only instance is needed for delete_document_in_elasticsearch()
pierre-24 Jan 3, 2017
14b20a0
es_manager will fail if no connection to ES
pierre-24 Jan 3, 2017
5a28679
add a test for the command es_manager
pierre-24 Jan 3, 2017
48af348
try me, travis
pierre-24 Jan 4, 2017
3544f3e
good. Now, try arder, travis
pierre-24 Jan 4, 2017
22a9b11
refresh index, otherwise it is not available for research!
pierre-24 Jan 4, 2017
4142a93
use custom logger
pierre-24 Jan 4, 2017
28215cb
correct spelling mistakes, thx @vhf!
pierre-24 Jan 4, 2017
2c22184
add some tests for ESIndexManager
pierre-24 Jan 4, 2017
13ef3a1
add test for view (more will come, but dodo before\!)
pierre-24 Jan 4, 2017
2c14aac
New tests for view
pierre-24 Jan 5, 2017
f0ce199
Publishable is not published
pierre-24 Jan 6, 2017
bf36ebb
online editor is against pep 8
pierre-24 Jan 6, 2017
869df81
Let's be coherent with tutorialv2
pierre-24 Jan 6, 2017
944d5a6
old search is past history!
pierre-24 Jan 6, 2017
c70d210
upgrade doc as well
pierre-24 Jan 6, 2017
98f786a
Use SearchForm on home page
pierre-24 Jan 8, 2017
d662c7a
Merge remote-tracking branch 'upstream/dev' into add_elasticsearch
pierre-24 Jan 8, 2017
28c3677
Fix migration conflict
pierre-24 Jan 8, 2017
79cc588
where does this error come from?
pierre-24 Jan 8, 2017
7feac1b
Merge branch 'dev' of https://github.com/zestedesavoir/zds-site into …
pierre-24 Jan 8, 2017
c1f4745
Implement modifications of @vhf
pierre-24 Jan 9, 2017
7adb8fb
add information about memory usage
pierre-24 Jan 9, 2017
422dbc3
pip freeze stuffs
pierre-24 Jan 9, 2017
964acf9
update.md
pierre-24 Jan 9, 2017
adebf61
improve .travis.yml, thx to @firm1
pierre-24 Jan 9, 2017
ba6867f
Merge branch 'dev' of https://github.com/zestedesavoir/zds-site into …
pierre-24 Jan 10, 2017
429c8da
Merge branch 'dev' of https://github.com/zestedesavoir/zds-site into …
pierre-24 Jan 11, 2017
92f38ab
use single quote when possible (to keep the work of @vhf intact)
pierre-24 Jan 11, 2017
f7cab5b
correction for @situphen
pierre-24 Jan 15, 2017
d04b82c
Munin plugin
pierre-24 Jan 15, 2017
eb0dd5a
now, its elasticsearch 5.1.2
pierre-24 Jan 15, 2017
aa05f19
snakecase for subcommands
pierre-24 Jan 15, 2017
bda0ea0
mark_keywords is now in settings.py
pierre-24 Jan 15, 2017
4ff0a26
of course, I need to change the tests as well --"
pierre-24 Jan 15, 2017
da373f6
Post is directly hidden in ES
pierre-24 Jan 15, 2017
ea150e8
add note to contributors
pierre-24 Jan 15, 2017
8f6d2c0
change shards and replicas
pierre-24 Jan 15, 2017
1eacfcd
use ES_Q() instead of Q()
pierre-24 Jan 18, 2017
38ebf2d
add thumbnail and tags
pierre-24 Jan 18, 2017
9671892
search filters below the bar and not in sidebar
pierre-24 Jan 18, 2017
76e8748
mark corresponding posts updatable if topic is moved
pierre-24 Jan 20, 2017
3f3f28a
Delete previous chapters before reindexation
pierre-24 Jan 21, 2017
9f6192d
Improve update.md
pierre-24 Jan 21, 2017
01c5890
Merge branch 'dev' of https://github.com/zestedesavoir/zds-site into …
pierre-24 Jan 21, 2017
a5b632e
fix test and quotes
pierre-24 Jan 21, 2017
ae7dec1
batching the indexing
pierre-24 Jan 21, 2017
e4eeb0e
correct some mistakes
pierre-24 Jan 21, 2017
62e2f9c
update.md
pierre-24 Jan 21, 2017
e3754a4
correct batching !
pierre-24 Jan 21, 2017
0b969e1
Correct the error of @DevHugo
pierre-24 Jan 21, 2017
1e3f160
yield at the end
pierre-24 Jan 21, 2017
38a69d9
there is no need for with there
pierre-24 Jan 21, 2017
fa9f102
print and small optimisation
pierre-24 Jan 21, 2017
55c66d7
skip chapters correctly
pierre-24 Jan 22, 2017
7db240f
slight improvement of the documentation
pierre-24 Jan 22, 2017
de57f24
Batch par modèle
vhf Jan 22, 2017
3add954
fix tests
vhf Jan 22, 2017
4b4c6fb
Enlève les dépendences à solr/haystack
vhf Jan 23, 2017
8f02288
Optimisation de l'indexation initiale
vhf Jan 23, 2017
df7cd38
tests
vhf Jan 28, 2017
451280c
Groupe les contenus de recherche
vhf Jan 28, 2017
e9fcf7d
Améliore le design de la page de recherche
Situphen Jan 21, 2017
1b69595
add commands in Makefile
pierre-24 Jan 29, 2017
e804a2f
some code reviews
pierre-24 Jan 30, 2017
e415c99
done and done
pierre-24 Jan 31, 2017
f4b6221
Merge branch 'dev' of https://github.com/zestedesavoir/zds-site into …
pierre-24 Feb 4, 2017
0fda5a8
fix migrations (again :p)
pierre-24 Feb 4, 2017
c03dca7
Upgrade to lastest version
pierre-24 Feb 4, 2017
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
pysolr==3.4.0
pygments==2.1.3
python-social-auth==0.2.19
elasticsearch>=2.0.0,<3.0.0
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bon, ça, c'est moche, faut que je me fixe une version et que je m'y tienne.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tu es en dev, tu mettras la dernière version en date lorsque le WIP sera terminée. (pip freeze tout ça)

elasticsearch-dsl>=5.0.0,<6.0.0

# Explicit dependencies (references in code)
Django==1.8.16
Expand Down
34 changes: 34 additions & 0 deletions zds/forum/migrations/0011_auto_20161224_1310.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tu sais que je t'aime toi.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<3


from django.db import migrations, models


class Migration(migrations.Migration):

dependencies = [
('forum', '0010_auto_20161112_1823'),
]

operations = [
migrations.AddField(
model_name='post',
name='es_already_indexed',
field=models.BooleanField(default=False, db_index=True, verbose_name=b'D\xc3\xa9j\xc3\xa0 index\xc3\xa9 par ES'),
),
migrations.AddField(
model_name='post',
name='es_flagged',
field=models.BooleanField(default=True, db_index=True, verbose_name=b'Doit \xc3\xaatre (r\xc3\xa9)index\xc3\xa9 par ES'),
),
migrations.AddField(
model_name='topic',
name='es_already_indexed',
field=models.BooleanField(default=False, db_index=True, verbose_name=b'D\xc3\xa9j\xc3\xa0 index\xc3\xa9 par ES'),
),
migrations.AddField(
model_name='topic',
name='es_flagged',
field=models.BooleanField(default=True, db_index=True, verbose_name=b'Doit \xc3\xaatre (r\xc3\xa9)index\xc3\xa9 par ES'),
),
]
54 changes: 52 additions & 2 deletions zds/forum/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,12 @@
from django.core.urlresolvers import reverse
from django.db import models

from elasticsearch_dsl.field import Text, Keyword

from zds.forum.managers import TopicManager, ForumManager, PostManager, TopicReadManager
from zds.notification import signals
from zds.settings import ZDS_APP
from zds.search2.models import AbstractESDjangoIndexable
from zds.utils import get_current_user
from zds.utils import slugify
from zds.utils.models import Comment, Tag
Expand Down Expand Up @@ -157,7 +160,7 @@ def can_read(self, user):


@python_2_unicode_compatible
class Topic(models.Model):
class Topic(AbstractESDjangoIndexable):
"""
A Topic is a thread of posts.
A topic has several states, witch are all independent:
Expand Down Expand Up @@ -382,9 +385,39 @@ def old_post_warning(self):

return False

@classmethod
def get_es_mapping(cls):
es_mapping = super(Topic, cls).get_es_mapping()

es_mapping.field('title', Text())
es_mapping.field('subtitle', Text())
es_mapping.field('get_absolute_url', Text(index='not_analyzed'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

index doit être un booléen, pas un string. Je suppose que False vaut not_analyzed.

(Je mets qu'un commentaire, mais faut changer les 18 occurrences de index='not_analyzed')

es_mapping.field('tags', Keyword())

return es_mapping

@classmethod
def get_es_django_indexable(cls, force_reindexing=False):
"""Overridden to remove hidden forums (and prefetch tags)
"""

query = super(Topic, cls).get_es_django_indexable(force_reindexing)
return query.prefetch_related('tags').filter(forum__group__isnull=True)

def get_es_document_source(self, excluded_fields=None):
"""Overridden to handle the case of tags (M2M field)
"""

excluded_fields = excluded_fields or []
excluded_fields.extend(['tags'])

data = super(Topic, self).get_es_document_source(excluded_fields=excluded_fields)
data['tags'] = [tag.title for tag in self.tags.all()]
return data


@python_2_unicode_compatible
class Post(Comment):
class Post(Comment, AbstractESDjangoIndexable):
"""
A forum post written by an user.
A post can be marked as useful: topic's author (or admin) can declare any topic as "useful", and this post is
Expand Down Expand Up @@ -413,6 +446,23 @@ def get_absolute_url(self):
def get_notification_title(self):
return self.topic.title

@classmethod
def get_es_mapping(cls):
m = super(Post, cls).get_es_mapping()

m.field('text', Text())
m.field('get_absolute_url', Text(index='not_analyzed'))

return m

@classmethod
def get_es_django_indexable(cls, force_reindexing=False):
"""Overridden to remove invisible post
"""

q = super(Post, cls).get_es_django_indexable(force_reindexing)
return q.filter(is_visible=True)


@python_2_unicode_compatible
class TopicRead(models.Model):
Expand Down
25 changes: 25 additions & 0 deletions zds/search2/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# coding: utf-8
from django.conf import settings
from elasticsearch_dsl.connections import connections

DEFAULT_ES_CONNECTIONS = {
'default': {
'hosts': ['localhost:9200'],
}
}

INDEX_NAME = getattr(settings, 'ES_INDEX_NAME', 'elastic')
CONNECTIONS = getattr(settings, 'ES_CONNECTIONS', DEFAULT_ES_CONNECTIONS)


def setup_es_connections():
"""Create connection(s) to Elasticsearch from parameters defined in the settings.

CONNECTIONS is a dict, where the keys are connection aliases and the values are parameters to the
``elasticsearch_dsl.connections.connection.create_connection()`` function (which are directly passed to an
Elasticsearch object, see http://elasticsearch-py.readthedocs.io/en/master/api.html#elasticsearch for the options).

"""

for alias, params in CONNECTIONS.items():
connections.create_connection(alias, **params)
Empty file.
Empty file.
48 changes: 48 additions & 0 deletions zds/search2/management/commands/es_manager.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# coding: utf-8

from django.core.management.base import BaseCommand, CommandError

from zds.search2 import INDEX_NAME, setup_es_connections
from zds.search2.models import ESIndexManager, get_django_indexable_objects
from zds.tutorialv2.models.models_database import FakeChapter


class Command(BaseCommand):
help = 'Manage data from/to ES'

indexer = None
models = get_django_indexable_objects()

def __init__(self, *args, **kwargs):
super(Command, self).__init__(*args, **kwargs)

self.models.insert(0, FakeChapter) # FakeChapter needs to be first

def add_arguments(self, parser):
parser.add_argument('action', type=str)

def handle(self, *args, **options):
setup_es_connections()
self.indexer = ESIndexManager(INDEX_NAME)

if options['action'] == 'setup':
self.setup_es()
elif options['action'] == 'index-all':
self.index_documents(force_reindexing=True)
elif options['action'] == 'index-flagged':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Du coup j'ai pas compris cette notion de "flagged". Peux-tu l'expliquer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ben en gros, ça évite de réindexer ce qui est déjà indexé et qui à priori n'a pas changé. Lorsqu'on récupère les objets à indexer, on ne récupère que ceux qui sont à es_flagged=True. C'est le must_reindex de solr, que d'ailleurs j'aurais probablement du appeller pour ça, mais je me suis fourvoyé avec l'option coté solr, qui s'appelle --only-flagged.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merci 👍

self.index_documents(force_reindexing=False)
else:
raise CommandError('unknown action {}'.format(options['action']))

def setup_es(self):

self.indexer.reset_es_index()
self.indexer.setup_es_mappings(self.models)

def index_documents(self, force_reindexing=False):

if force_reindexing:
self.setup_es() # remove all previous data

for model in self.models:
self.indexer.es_bulk_indexing_of_model(model, force_reindexing=force_reindexing)
Loading