-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Develop #982
base: master
Are you sure you want to change the base?
Develop #982
Conversation
v0.3.11 v0.3.11
black 格式化
this is no need for python3
Tagging version 0.4.0release 0.4.0release
Dockerfile
Outdated
@@ -26,6 +26,8 @@ RUN pip install -r /opt/pyspider/requirements.txt | |||
|
|||
# add all repo | |||
ADD ./ /opt/pyspider | |||
# install seadaka requ | |||
RUN pip install -r /opt/pyspider/requirements-seadaka.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't your package in base docker image, you could create another Dockerfile that base on this image while install your additional dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for that, I remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the patch. Overall looks good, just have some question regarding the behavior change in Elasticsearch. And how did you test it?
@@ -161,4 +171,4 @@ def dbcur(self): | |||
db._update(db.__tablename__, "id = 1", age=16) | |||
assert db._select(db.__tablename__, "name, age").next() == (None, 16) | |||
db._delete(db.__tablename__, "id = 1") | |||
assert [row for row in db._select(db.__tablename__)] == [] | |||
assert [row for row in db._select(db.__tablename__)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The behavior changed from assert the table to be empty to not empty. Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because empty list is False in python.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert 是 true 断言。原来的测试是说delete之后列表为空,改过之后变成列表非空了。
@@ -19,7 +20,7 @@ def __init__(self, hosts, index='pyspider'): | |||
self.index = index | |||
self.es = Elasticsearch(hosts=hosts) | |||
|
|||
self.es.indices.create(index=self.index, ignore=400) | |||
self.es.indices.create(index=self.index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did API change? I guess it's trying to ignore the IndexAlreadyExistsException error.
https://elasticsearch-py.readthedocs.io/en/7.x/api.html#ignore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ElasticSearch 是根据 pylint 提示优化的, 这部分做的不好, 没有测试. 我想先停止这次合并请求. 我漏掉了很多测试.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我对 ES 的使用 不是很熟练. 哪怕是 ElasticSearch client 版本 7.10.0 和 7.17.0 之间也有很多区别. 我觉得直接升级到 最新的版本,并做好测试更好
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, upgrade to the latest version is good choice, just make sure it wouldn't raise error when the index has already created.
obj.update(kwargs) | ||
obj['updatetime'] = time.time() | ||
return self.es.update(index=self.index, doc_type=self.__type__, | ||
body={'doc': obj}, id=name, refresh=True, ignore=404) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, refresh is to make sure the change to project are visible immediately. The updates on projectdb is expected to be low, that while used as database, it's exepcted for user to see the change immediately.
@@ -57,8 +59,7 @@ def get_all(self, fields=None): | |||
yield record['_source'] | |||
|
|||
def get(self, name, fields=None): | |||
ret = self.es.get(index=self.index, doc_type=self.__type__, id=name, | |||
_source_include=fields or [], ignore=404) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't remember, but doesn't the default behavior for ProjectDB is to return None when a project doesn't exists?
|
||
self.es.indices.create(index=self.index, ignore=400) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this cause IndexAlreadyExistsException error when the resultdb are already exists?
@@ -36,7 +36,7 @@ def projects(self): | |||
ret = self.es.search(index=self.index, doc_type=self.__type__, | |||
body={"aggs": {"projects": { | |||
"terms": {"field": "project"} | |||
}}}, _source=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems the API changed to source
, that this API is meant to return just project names, that the source doesn't matter.
https://elasticsearch-py.readthedocs.io/en/latest/api.html#elasticsearch.Elasticsearch.search
@@ -62,7 +62,7 @@ def select(self, project, fields=None, offset=0, limit=0): | |||
else: | |||
for record in self.es.search(index=self.index, doc_type=self.__type__, | |||
body={'query': {'term': {'project': project}}}, | |||
_source_include=fields or [], from_=offset, size=limit, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to source_includes
?
https://elasticsearch-py.readthedocs.io/en/latest/api.html#elasticsearch.Elasticsearch.search
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignore this. My bad.
It's still their in 7.10.0
setup.py
Outdated
@@ -67,6 +67,8 @@ | |||
'Programming Language :: Python :: 3.5', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't you drop support for 3.6 and lower?
放弃 3.6 以下版本
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
因为 python2 已经停止维护, 3.4, .3.5 比较低的版本如果也维护会消耗大量的精力,而且这些版本使用的人应该比较少了.
很多依赖库也会慢慢放弃低版本的支持, 所以我想放弃低版本的维护.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我的意思是去掉setup设置的 3.5 支持。
@lusi1990 great work! lets roll this out |
对 3.7,3.8, 3.9 进行适配
升级一些依赖库, 比如 flask
删除一些适配 python2.7 的代码
优化部分代码
调整了代码格式