Paging with Pagetoken & Filter doesn't work properly #379

manuelwallrapp · 2023-03-06T07:17:09Z

Expected Behavior

If I ListRecords with this filter:
{
"parent": "-/results/-",
"page_size": 10,
"filter": "data_type=='tekton.dev/v1beta1.PipelineRun'"
}

I get the first time the correct answer:
{
"results": [ .... ]
"nextPageToken": "Q2lSbU9HVm1NRGMwT1Mwd05UVTFMVFExWldJdE9XRTRaaTB6TjJZeVpEbGlNR1kwWVRrU0syUmhkR0ZmZEhsd1pUMDlKM1JsYTNSdmJpNWtaWFl2ZGpGaVpYUmhNUzVRYVhCbGJHbHVaVkoxYmlj"
}

The next Request I do with the received page_token:

{
"parent": "-/results/-",
"page_size": 10,
"filter": "data_type=='tekton.dev/v1beta1.PipelineRun'",
"page_token": "Q2lSbU9HVm1NRGMwT1Mwd05UVTFMVFExWldJdE9XRTRaaTB6TjJZeVpEbGlNR1kwWVRrU0syUmhkR0ZmZEhsd1pUMDlKM1JsYTNSdmJpNWtaWFl2ZGpGaVpYUmhNUzVRYVhCbGJHbHVaVkoxYmlj"
}

I would expect another token when I have more than 20 Datasets (I have in this case around 150 PipelineRuns)
{
"results": [ .... ]
"nextPageToken": "Q2lSbU9..... just another next_page_token"
}

Actual Behavior

When I send a ListRecordRequest with the page_token, I receive an empty nextPageToken in case of PipelineRuns in case of TaskRuns not.:
{
"results": [ .... ]
"nextPageToken": ""
}

Steps to Reproduce the Problem

Do Paging with a filter on field type. After the second page, the API just delivers an empty page token

I have this effect with Postman over GRPC and REST and in my Java client over REST.

I built my own master branch version of tekton-results.

manuelwallrapp · 2023-03-08T17:39:33Z

@alan-ghelardi @adambkaplan @avinal do you know if I do something wrong or can you confirm this issue? For me this is a blocker. When I page through the results I even get duplicates. Also when I add the page_size parameter, the page_size works, but it delivers me duplicates.

For example I call this URL:
/apis/results.tekton.dev/v1alpha2/parents/esta-tekton-predev/results/-/records?filter=data_type == "tekton.dev/v1beta1.PipelineRun"&page_size=10

Then I get per page several duplicated entries. Like the same PipelineRun 2 times. Would be great to fix this.

alan-ghelardi · 2023-03-08T19:54:24Z

I'm about to start working on integrating the POC mentioned here to the Results codebase. I have the impression that this issue might be strongly connected to the way how Results treat filters nowadays, basically, creating in-memory batches and applying the CEL filters rather than leveraging database capabilities. Reviewing pagination would be the next step.

alan-ghelardi · 2023-03-08T19:54:32Z

/assign

alan-ghelardi · 2023-03-08T19:55:50Z

Having a E2E test case for pagination is a must have by the way.

manuelwallrapp · 2023-03-09T07:06:34Z

@alan-ghelardi thank you for your Feedback. Great you are working on a POC for CEL to SQL. I thought it is already implemented like this. At the moment we are a little thin with resources, but I am looking forward to also contribute to tekton-results.

manuelwallrapp · 2023-03-09T09:56:19Z

@alan-ghelardi when I have in the results table for example 133 Datasets for a particular parent. In my case esta-tekton-predev I check the amount like this.

select count(*) from results t where t.parent='esta-tekton-predev'

I do a REST Api call like this:
/apis/results.tekton.dev/v1alpha2/parents/esta-tekton-predev/results

And then page through the results. But if I request with Postman or my Java Client Page by page, I only receive 110 Results. And even more weird, I receive duplicates inconsistently. So if I do a distinct, there are for example 55 Objects left, sometimes 72, sometimes 45.

There is something very wrong with the API. I don't know how you use it. The Watcher works great, but the API is not usable for queries.

alan-ghelardi · 2023-03-09T16:14:55Z

Yep, definitely it's broken. Currently, at Nubank, we use the watcher for archiving and garbage collecting runs as well as the API to implement our re-run mechanism. In the re-run we receive the result ID via GitHub check runs and use it to find the PipelineRun and all TaskRuns associated to it. This request doesn't depend upon pagination though. In practice, we aren't using the API for querying arbitrary objects due to the existing limitations - we're working on a custom dashboard as well as on an API gateway that will rely on the Results GRPC server to query objects, but those fronts still depend on the feature I mentioned.

There are two major features I'd like to implement for the next release: the CEL to SQL interpreter and the feature described at #343. Actually this one is already implemented in our internal platform but still need to be merged into the upstream project. By combining them, you'll be able to annotate a PipelineRun with arbitrary keys (say, repo=foo/bar) and run a query using the filter: summary.annotations["repo"] == "foo/bar" and select all corresponding results.

I keep you posted on the progress of this initiative.

manuelwallrapp · 2023-03-09T16:40:59Z

Ok, great. I suppose, with the CEL to SQL interpreter you will also fix the Paging? What is the time window you are planning the next release (weeks, months)?
For us the paging fix would solve the problem and we could use tekton-results. That would be enough for us for the moment.

alan-ghelardi · 2023-03-10T15:56:57Z

We talked about trying to keep monthly releases, although it depends upon various factors. Anyhow, I'd like to move the POC over the Results codebase next week (if the time permits). Then I'd like to look into the pagination issue.

manuelwallrapp · 2023-03-14T09:03:53Z

That would be great, thank you @alan-ghelardi. So we wait for your bugfix.

alan-ghelardi · 2023-04-21T15:42:15Z

@manuelwallrapp #404 contains the features I mentioned earlier. The pr needs to be reviewed and discussed with project members, so I'd expect some time to get it approved and merged, but we're using the features in our production cluster. I'd like to write a comprehensive documentation on new search capabilities in a separate pull request, but if you want to be a beta tester, I can provide you instructions to proceed.

alan-ghelardi · 2023-04-21T15:45:43Z

To clarify, that pull request implements support for converting CEL expressions to SQL clauses internally and redesign the pagination to work in this new scenario. Nothing changed in the way how you use pagination (you'll keep passing a page token to the API), but the mechanism works properly now with the newly introduced cel2sql engine.

manuelwallrapp · 2023-04-24T12:31:01Z

Thank you @alan-ghelardi, I will test the paging in the next few days and will write a comment.

alan-ghelardi · 2023-04-25T16:11:25Z

Awesome.

By the way #426 was merged and #404 contains this patch. With those features in place, you can set custom key/values to query your PipelineRuns. For instance, you can annotate the PipelineRun with the repository it belongs to:

apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
  generateName: hello-run-
  annotations:
    results.tekton.dev/resultAnnotations: |-
      {"repo": "tektoncd/results", "commit": "1a6b908"}
    results.tekton.dev/recordSummaryAnnotations: |-
      {"foo": "bar"}

Then, to find them you can use filters such as:

annotations["repo"] == "tektoncd/results" && summary.annotations["foo"] == "bar"

Notice that those fields are only available for ListResults operations.

You can create gin indexes on those fields to take advantage of the Postgres support for querying JSON objects:

CREATE INDEX IF NOT EXISTS results_annotations  ON results USING gin (annotations jsonb_path_ops);

CREATE INDEX IF NOT EXISTS record_summary_annotations  ON results USING gin (recordsummary_annotations jsonb_path_ops);

Behind the scenes, the CEL expression is converted to JSON selectors to leverage Postgres capabilities.

manuelwallrapp added the kind/bug Categorizes issue or PR as related to a bug. label Mar 6, 2023

tekton-robot assigned alan-ghelardi Mar 8, 2023

alan-ghelardi added this to the Enhance the Results API search capabilities milestone Mar 10, 2023

alan-ghelardi mentioned this issue Apr 21, 2023

Implement CEL to SQL interpreter to enhance searches #404

Merged

7 tasks

alan-ghelardi mentioned this issue Apr 28, 2023

CEL Filters across Results return multiple duplicate records #450

Closed

tekton-robot closed this as completed in #404 Jun 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Paging with Pagetoken & Filter doesn't work properly #379

Paging with Pagetoken & Filter doesn't work properly #379

manuelwallrapp commented Mar 6, 2023

manuelwallrapp commented Mar 8, 2023

alan-ghelardi commented Mar 8, 2023

alan-ghelardi commented Mar 8, 2023

alan-ghelardi commented Mar 8, 2023

manuelwallrapp commented Mar 9, 2023

manuelwallrapp commented Mar 9, 2023

alan-ghelardi commented Mar 9, 2023

manuelwallrapp commented Mar 9, 2023

alan-ghelardi commented Mar 10, 2023

manuelwallrapp commented Mar 14, 2023

alan-ghelardi commented Apr 21, 2023

alan-ghelardi commented Apr 21, 2023

manuelwallrapp commented Apr 24, 2023

alan-ghelardi commented Apr 25, 2023 •

edited

Loading

Paging with Pagetoken & Filter doesn't work properly #379

Paging with Pagetoken & Filter doesn't work properly #379

Comments

manuelwallrapp commented Mar 6, 2023

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

manuelwallrapp commented Mar 8, 2023

alan-ghelardi commented Mar 8, 2023

alan-ghelardi commented Mar 8, 2023

alan-ghelardi commented Mar 8, 2023

manuelwallrapp commented Mar 9, 2023

manuelwallrapp commented Mar 9, 2023

alan-ghelardi commented Mar 9, 2023

manuelwallrapp commented Mar 9, 2023

alan-ghelardi commented Mar 10, 2023

manuelwallrapp commented Mar 14, 2023

alan-ghelardi commented Apr 21, 2023

alan-ghelardi commented Apr 21, 2023

manuelwallrapp commented Apr 24, 2023

alan-ghelardi commented Apr 25, 2023 • edited Loading

alan-ghelardi commented Apr 25, 2023 •

edited

Loading