Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove duplicate argument #1163

Closed
wants to merge 1 commit into from
Closed

Conversation

mortenhauberg
Copy link

Looking at the documentation, doc_type is listed twice for the bulk function

Duplicate doc_type

@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

@sethmlarson
Copy link
Contributor

Thanks for opening this! Unfortunately any changes to code under elasticsearch/client/... is automatically generated by the script in utils/generate_api.py and so would necessitate a change there instead. Would you be interested in digging into that?

@mortenhauberg
Copy link
Author

@sethmlarson I've been looking at that script for 15 minutes now.
Can you point me in the right direction 😄 ?

@sethmlarson
Copy link
Contributor

sethmlarson commented Mar 19, 2020

Sure! It loads parameters from this file (along with all the other files in that directory), loads them into Module and API objects and then dumps them into Jinja2 templates.

You can see in the file there are multiple paths for the "bulk" API, one with type and one without type (in the API specs it's called type but in our code it's doc_type to avoid a name conflict). For some reason it looks like "doc_type" is appearing in API.params multiple times and thus multiple times in the docstring, figure out why that is. Is that a good launch-off point? :)

To start I'd clone the elasticsearch repo via git clone --branch 7.x https://github.com/elastic/elasticsearch at the same level as the elasticsearch-py repository and try running the script first, see what happens, then maybe add some print statements within API.params.

Good luck and feel free to ask more questions! 🚀

@mortenhauberg
Copy link
Author

I hope so. Starting to dive into.
Quick question - I have to find the same version for elastic/elasticsearch and for elastic/elasticsearch-py, right?

I kinda struggle with that.
I've tried having both on master and 7.x, but when I run python utils/generate_api.py I get a ton of changes. I don't assume that's intended?

@sethmlarson
Copy link
Contributor

Yeah I'd start by checking out matching branches, the 7.x branch on this repo is up to date. When you figure out the issue for the 7.x branch we can "forward-port" it to the master branch.
I'm working on getting master/8.0 up to date right now :)

@mortenhauberg
Copy link
Author

Okay. I'm on 7.x on both repositories.
After running the script I get the following. Is that intended?

$ git status
On branch 7.x
Your branch is up to date with 'origin/7.x'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

        modified:   elasticsearch/client/__init__.py
        modified:   elasticsearch/client/cat.py
        modified:   elasticsearch/client/ccr.py
        modified:   elasticsearch/client/cluster.py
        modified:   elasticsearch/client/enrich.py
        modified:   elasticsearch/client/indices.py
        modified:   elasticsearch/client/ml.py
        modified:   elasticsearch/client/nodes.py
        modified:   elasticsearch/client/security.py
        modified:   elasticsearch/client/sql.py
        modified:   elasticsearch/client/tasks.py
        modified:   elasticsearch/client/transform.py
        modified:   elasticsearch/client/watcher.py

Untracked files:
  (use "git add <file>..." to include in what will be committed)

        elasticsearch/client/async_search.py
        elasticsearch/client/autoscaling.py
        elasticsearch/client/eql.py

@sethmlarson
Copy link
Contributor

Hmm I wouldn't expect async_search or any of those new modules. Maybe you should try resetting that whole directory with git checkout -- elasticsearch/client and then run again, looks like you had elasticsearch/master checked out when you ran first.

@mortenhauberg
Copy link
Author

Exactly - that why I was confused.
Resetting and running it again gives me the exact same result.

Not really sure what I've messed up

$ git checkout -- elasticsearch/client
$ python3 utils/generate_api.py
$ git status
On branch 7.x
Your branch is up to date with 'origin/7.x'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

        modified:   elasticsearch/client/__init__.py
        modified:   elasticsearch/client/cat.py
        modified:   elasticsearch/client/ccr.py
        modified:   elasticsearch/client/cluster.py
        modified:   elasticsearch/client/enrich.py
        modified:   elasticsearch/client/indices.py
        modified:   elasticsearch/client/ml.py
        modified:   elasticsearch/client/nodes.py
        modified:   elasticsearch/client/security.py
        modified:   elasticsearch/client/sql.py
        modified:   elasticsearch/client/tasks.py
        modified:   elasticsearch/client/transform.py
        modified:   elasticsearch/client/watcher.py

Untracked files:
  (use "git add <file>..." to include in what will be committed)

        elasticsearch/client/async_search.py
        elasticsearch/client/autoscaling.py
        elasticsearch/client/eql.py

From elastic/elasticsearch

$ git status
On branch 7.x
Your branch is up to date with 'origin/7.x'.

nothing to commit, working tree clean

@sethmlarson
Copy link
Contributor

sethmlarson commented Mar 19, 2020

Ah, why don't you try checking out the elasticsearch/v7.6.1 tag via git checkout v7.6.1. Looks like there are new APIs for 7.7, we're only on 7.6. :)

@mortenhauberg
Copy link
Author

Much better - thanks!

Still some changes that look a bit weird. I'm not sure if that's intended

diff --git a/elasticsearch/client/__init__.py b/elasticsearch/client/__init__.py
index 7b4568e..842b598 100644
--- a/elasticsearch/client/__init__.py
+++ b/elasticsearch/client/__init__.py
@@ -318,11 +318,11 @@ class Elasticsearch(object):
         :arg refresh: If `true` then refresh the affected shards to make
             this operation visible to search, if `wait_for` then wait for a refresh
             to make this operation visible to search, if `false` (the default) then
-            do nothing with refreshes.  Valid choices: true, false, wait_for
+            do nothing with refreshes.   Valid choices: true, false, wait_for
         :arg routing: Specific routing value
         :arg timeout: Explicit operation timeout
         :arg version: Explicit version number for concurrency control
-        :arg version_type: Specific version type  Valid choices:
+        :arg version_type: Specific version type   Valid choices:
             internal, external, external_gte
         :arg wait_for_active_shards: Sets the number of shard copies
             that must be active before proceeding with the index operation. Defaults
@@ -374,17 +374,17 @@ class Elasticsearch(object):
             number
         :arg op_type: Explicit operation type. Defaults to `index` for
             requests with an explicit document ID, and to `create`for requests
-            without an explicit document ID  Valid choices: index, create
+            without an explicit document ID   Valid choices: index, create
         :arg pipeline: The pipeline id to preprocess incoming documents
             with
         :arg refresh: If `true` then refresh the affected shards to make
             this operation visible to search, if `wait_for` then wait for a refresh
             to make this operation visible to search, if `false` (the default) then
-            do nothing with refreshes.  Valid choices: true, false, wait_for
+            do nothing with refreshes.   Valid choices: true, false, wait_for
         :arg routing: Specific routing value
         :arg timeout: Explicit operation timeout
         :arg version: Explicit version number for concurrency control
-        :arg version_type: Specific version type  Valid choices:
+        :arg version_type: Specific version type   Valid choices:
             internal, external, external_gte
         :arg wait_for_active_shards: Sets the number of shard copies
             that must be active before proceeding with the index operation. Defaults

@sethmlarson
Copy link
Contributor

You can ignore those small changes, although I don't know why it'd increase from two spaces to 3 without a change in the template. Not a big problem though!

Now you can start investigating how the generate_api.py script works! :D

@mortenhauberg
Copy link
Author

I'll ignore it then.
Thank you for your help so far

@mortenhauberg
Copy link
Author

@sethmlarson I think I found the issue.

type is defined both url (here) and in params (here).

I found a solution for it by changing API.params in generate_api.py to the following.

@property
def params(self):
    parts = self.all_parts

    return chain(
        ((p, parts[p]) for p in parts if parts[p]["required"]),
        (("body", self.body), ) if self.body else (),
        ((p, parts[p]) for p in parts
            if not parts[p]["required"] and
            p not in self._def.get("params", {})),
        sorted(self._def.get("params", {}).items()),
    )

Adding the extra p not in self._def.get("params", {})

I don't feel comfortable enough to judge if that the best or even a proper solution.

Advice would be much appreciated.

@sethmlarson
Copy link
Contributor

@mortenhauberg That looks like a good lead! What happens when you generate the API with that additional clause? Are there more duplicated parameters elsewhere that it catches?

Thanks for working on this, I feel like we're getting close to a solution :)

@sethmlarson
Copy link
Contributor

I ran your change locally and it looks like it de-duplicated in a bunch of places properly which is great! However I think it'd be best if we kept the "upper" parameter rather than removing the upper one since all the parameters that are duplicated are both in the URL path and in the query params list.

@mortenhauberg
Copy link
Author

mortenhauberg commented Mar 20, 2020

Good.
Any pointers on how to remove the second occurrence 😄?

@sethmlarson
Copy link
Contributor

I think I found a solution, adding , key=lambda x: (x[0] not in parts, x[0]) to the sorted() call within API.params ensures that the non-required parameters are on the top of the non-required list.

    @property
    def params(self):
        parts = self.all_parts
        params = self._def.get("params", {})
        return chain(
            ((p, parts[p]) for p in parts if parts[p]["required"]),
            (("body", self.body),) if self.body else (),
            ((p, parts[p]) for p in parts
             if not parts[p]["required"] and
             p not in params),
            sorted(params.items(), key=lambda x: (x[0] not in parts, x[0])),
        )

@sethmlarson
Copy link
Contributor

Could you create a new branch against the 7.x branch, add this change, and then re-run the generator? I'll review that and commit it to the 7.x branch and forward-port to the master branch :)

@mortenhauberg
Copy link
Author

Sure thing.
Do I commit the files created in elasticsearch/client or is that part of the pipeline?

@mortenhauberg
Copy link
Author

I've created #1169 now

@sethmlarson
Copy link
Contributor

I'd make two commits, one for utils/generate_api.py and one for the results of running the script. But tbh it doesn't matter much either way, both work! :D

@mortenhauberg
Copy link
Author

I now face the issue where there's multiple spaces.. I'll dig into that

diff --git a/elasticsearch/client/__init__.py b/elasticsearch/client/__init__.py
index 7b4568e..d0784d7 100644
--- a/elasticsearch/client/__init__.py
+++ b/elasticsearch/client/__init__.py
@@ -318,11 +318,11 @@ class Elasticsearch(object):
         :arg refresh: If `true` then refresh the affected shards to make
             this operation visible to search, if `wait_for` then wait for a refresh
             to make this operation visible to search, if `false` (the default) then
-            do nothing with refreshes.  Valid choices: true, false, wait_for
+            do nothing with refreshes.   Valid choices: true, false, wait_for
         :arg routing: Specific routing value
         :arg timeout: Explicit operation timeout
         :arg version: Explicit version number for concurrency control
-        :arg version_type: Specific version type  Valid choices:
+        :arg version_type: Specific version type   Valid choices:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants