ES Search Query Collect All Response #1631

noah-paige · 2024-10-10T14:15:50Z

Feature or Bugfix

Bugfix

Detail

For catalog_indexer_task ensure we collect all hits from query response for with_deletes option
- Up the Query Size to 1000 results (default is 10)
- Add logic to continue querying to collect all hits if there are more than the query size limit (i.e. > 1000)

Relates

Security

Please answer the questions below briefly where applicable, or write N/A. Based on
OWASP 10.

Does this PR introduce or modify any input fields or queries - this includes
fetching data from storage outside the application (e.g. a database, an S3 bucket)?
- Is the input sanitized?
- What precautions are you taking before deserializing the data you consume?
- Is injection prevented by parametrizing queries?
- Have you ensured no eval or similar functions are used?
Does this PR introduce any functionality or component that requires authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
- Are you logging failed auth attempts?
Are you using or adding any cryptographic features?
- Do you use a standard proven implementations?
- Are the used keys controlled by the customer? Where are they stored?
Are you introducing any new policies/roles/users?
- Have you used the least-privilege principle? How?

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…rch query

noah-paige · 2024-10-10T21:12:28Z

TESTING - Tested both locally and in AWS for the following:

Test startReindexTask successfully runs catalog_indexer_task when withDeletes=False
Test startReindexTask successfully runs catalog_indexer_task when withDeletes=True
Test startReindexTask successfully runs catalog_indexer_task when withDeletes=True and # of objects to delete is > 10 and QUERY_SIZE is 10 (i.e. multiple search calls required to collect all responses)
UI button works to invoke startReindexCatalog as Admin with either withDeletes Switch T/F

…ix/catalog-indexer-pagination

dlpzx

Looks good. I just want to confirm the new way we are dealing with the response of search.
response = {'hits':{'hits':[{'_id:1, ...}...]}}
Before:

we extract docs.get('hits', {}).get('hits', []) in the catalog indexer task from search
we extract hits-hits in the FE from search - with the Catalog DataSearch props...
After:
we directly get the hits in the catalog indexer task from search_all
we extract hits-hits in the FE from search - we do not change anything to not mess up with the Catalog view

noah-paige · 2024-10-29T13:18:40Z

Looks good. I just want to confirm the new way we are dealing with the response of search. response = {'hits':{'hits':[{'_id:1, ...}...]}} Before:

* we extract docs.get('hits', {}).get('hits', []) in the catalog indexer task from search

* we extract hits-hits in the FE from search - with the Catalog DataSearch props...
  After:

* we directly get the hits in the catalog indexer task from search_all

* we extract hits-hits in the FE from search - we do not change anything to not mess up with the Catalog view

Correct! The FE Component we use automatically handles all the pagination for us

### Feature or Bugfix  - Bugfix ### Detail - For `catalog_indexer_task` ensure we collect all hits from query response for `with_deletes` option - Up the Query Size to 1000 results (default is 10) - Add logic to continue querying to collect all hits if there are more than the query size limit (i.e. > 1000) ### Relates - <URL or Ticket> ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

### Feature or Bugfix - Security ### Detail * get-parameter CloudfrontDistributionDomainName from us-east-1 (#1687 ) * Added Token Validations (#1682) * add warning to untrust data.all account when removing an environment (#1685) * add custom domain support for apigw (#1679) * Lambda Event Logs Handling (#1678) * Upgrade Spark version to 3.3 (#1675) - a0c63a4 * ES Search Query Collect All Response (#1631) * Extend Tenant Perms Coverage (#1630) * Limit Response info dataset queries (#1665) * Add Removal Policy Retain to Bucket Policy IaC (#1660) * log API handler response only for LOG_LEVEL DEBUG. Set log level INFO for prod deployments (#1662) * Add permission checks to markNotificationAsRead + deleteNotification (#1654) * Added error view and unified utility to check tenant user (#1657 * Userguide signout flow (#1629) ### Relates - Security release ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Co-authored-by: Noah Paige <[email protected]> Co-authored-by: Petros Kalos <[email protected]>

noah-paige added 2 commits October 9, 2024 14:17

collect all responses from paginated response catalog indexer opensea…

0982480

…rch query

Fix tests on catalog index task

6ebb436

noah-paige added 2 commits October 10, 2024 17:14

Make sure Switch value is in sync with withDeletes param

d84be9d

Merge brant push os fix/catalog-indexer-paginationch 'os-main' into f…

a0e588c

…ix/catalog-indexer-pagination

noah-paige self-assigned this Oct 22, 2024

noah-paige marked this pull request as ready for review October 23, 2024 21:43

dlpzx self-requested a review October 25, 2024 07:01

dlpzx approved these changes Oct 29, 2024

View reviewed changes

noah-paige merged commit 92b591f into main Oct 29, 2024
9 checks passed

dlpzx mentioned this pull request Nov 6, 2024

2.6.1 Security features #1686

Merged

dlpzx deleted the fix/catalog-indexer-pagination branch November 22, 2024 11:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ES Search Query Collect All Response #1631

ES Search Query Collect All Response #1631

noah-paige commented Oct 10, 2024

noah-paige commented Oct 10, 2024 •

edited

Loading

dlpzx left a comment

noah-paige commented Oct 29, 2024

ES Search Query Collect All Response #1631

ES Search Query Collect All Response #1631

Conversation

noah-paige commented Oct 10, 2024

Feature or Bugfix

Detail

Relates

Security

noah-paige commented Oct 10, 2024 • edited Loading

dlpzx left a comment

Choose a reason for hiding this comment

noah-paige commented Oct 29, 2024

noah-paige commented Oct 10, 2024 •

edited

Loading