Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scroll search with long scroll_id #1044

Closed
c-leroy opened this issue Jun 23, 2020 · 4 comments
Closed

Scroll search with long scroll_id #1044

c-leroy opened this issue Jun 23, 2020 · 4 comments

Comments

@c-leroy
Copy link

c-leroy commented Jun 23, 2020

Reference documentation from https://www.elastic.co/guide/en/elasticsearch/client/php-api/current/search_operations.html tells us to use scroll_id parameter in scroll() parameters. This leads to a Elasticsearch error in case scroll_id is too large, since - in this case - it is sent in request URI, which overcomes elasticsearch limits.

Example from https://www.elastic.co/guide/en/elasticsearch/client/php-api/current/search_operations.html is :

$client = ClientBuilder::create()->build();
$params = [
    'scroll' => '30s',          // how long between scroll requests. should be small!
    'size'   => 50,             // how many results *per shard* you want back
    'index'  => 'my_index',
    'body'   => [
        'query' => [
            'match_all' => new \stdClass()
        ]
    ]
];

// Execute the search
// The response will contain the first batch of documents
// and a scroll_id
$response = $client->search($params);

// Now we loop until the scroll "cursors" are exhausted
while (isset($response['hits']['hits']) && count($response['hits']['hits']) > 0) {

    // **
    // Do your work here, on the $response['hits']['hits'] array
    // **

    // When done, get the new scroll_id
    // You must always refresh your _scroll_id!  It can change sometimes
    $scroll_id = $response['_scroll_id'];

    // Execute a Scroll request and repeat
    $response = $client->scroll([
            'scroll_id' => $scroll_id,  //...using our previously obtained _scroll_id
            'scroll'    => '30s'        // and the same timeout window
        ]
    );
}

After viewing src/Elasticsearch/Endpoints/Scroll.php, we can see that :

  1. there is a warning about using scroll_id main parameter, that we didn't see (we are using kubernetes, maybe is it misconfigured?)
  2. after a little reading inside, we guessed we could use the "body" parameter which would then end up with a sane URI using POST method and a body containing the needed scroll_id. It worked.

Correct form of scroll() with current version to avoid URI overflow :

      $response = $client->scroll([
        'body' => [
          'scroll_id' => $scroll_id,
          'scroll' => '30s'
        ]
      ]);

The same goes for deletion, the only way to delete a large scroll_id is :

      $client->clearScroll([
        'body' => [
          'scroll_id' => $scroll_id
        ]
      ]);

We are using elasticsearch-php 7.4.1

Regards

@ezimuel
Copy link
Contributor

ezimuel commented Jul 17, 2020

@c-leroy thanks for reporting this. The @trigger_error() reported in Elasticsearch\Endpoints\Scroll is a warning about the usage of scroll_id in the body.
We used the @ operator of PHP to suppress error message because of BC break in 7.x, here the full story. If you want to capture this "silent deprecation error" you should use a custom erro handler of PHP, as follows:

function myErrorHandler($errno, $errstr, $errfile, $errline)
{
    switch ($errno) {
        case E_USER_DEPRECATED:
            printf("PHP Deprecated: %s in %s on line %d", $errstr, $errfile, $errline);
            break;
    }
    return true;
}

$oldErrorHandler = set_error_handler("myErrorHandler");
@trigger_error('This is a deprecated message', E_USER_DEPRECATED);

I'll update the documentation using scroll_id and scroll inside the body request.

@ezimuel
Copy link
Contributor

ezimuel commented Jul 17, 2020

@c-leroy I created #1052 to fix the documentation. Let me know what do you think, thanks!

@c-leroy
Copy link
Author

c-leroy commented Jul 17, 2020

@ezimuel Thanks for the explanation about @trigger_error().
I'm OK with #1052
Regards

@ezimuel
Copy link
Contributor

ezimuel commented Jul 17, 2020

Solved in #1052.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants