Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert CEL expressions to SQL filters to leverage the database for querying results and records #259

Closed
alan-ghelardi opened this issue Oct 29, 2022 · 7 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@alan-ghelardi
Copy link
Contributor

Feature request

Currently, the API server evaluates CEL expressions in memory to filter results and records. Although quite powerful, this approach has performance drawbacks considering large scale: it depends upon reading results and records in batches and ultimately, might end up scanning the whole database. Moreover, the approach in question doesn't leverage the database capabilities for querying data - Postgres provides a very powerful mechanism for querying jsonb columns, including support for indexing.

The initial proposal is creating a CEL to SQL interpreter that should attempt to translate CEL expressions into SQL filters as much as possible to enrich the list operations. Thus, the following CEL expression:

data.status.completionTime > "2022-10-29T10:00:20.021-05:00"

Results in the following SQL filter:

(data->status->>completionTime)::TIMESTAMP WITH TIME ZONE > '2022-10-29T10:00:20.021-05:00'::TIMESTAMP WITH TIME ZONE

This feature must be an opt-in initially. If the interpreter cannot convert the CEL expression to a valid Postgres SQL expression, the API server must fall back to the current behavior and evaluates the CEL program in memory.

The interpreter must distinguish correctly results and records and yield SQL filters accordingly. For instance, select expressions targeting the record's data field must be translated into JSON function calls, whereas select expressions targeting the result's summary field must query the respective columns in the results table (e.g. summary.type corresponds to results_summary_type in the SQL table). Thus, users who are listing results and records keep dealing with plain CEL expressions and the API server attempts to optimize the queries as much as possible.

Use case

This features targets all list operations, by enhancing the API's server performance. Once the mechanism is mature enough, a next step could be considering to create an index for record's labels. Then, users may query data faster by relying on the same approach used to fetch Kubernetes objects via label selectors.

@alan-ghelardi alan-ghelardi added the kind/feature Categorizes issue or PR as related to a new feature. label Oct 29, 2022
@alan-ghelardi
Copy link
Contributor Author

/assign @alan-ghelardi

@alan-ghelardi
Copy link
Contributor Author

I'll have some spare time next week for experimentations and I'd like to work on a POC to test this approach in practice.

@alan-ghelardi
Copy link
Contributor Author

Here is an ongoing proof of concept to implement this feature. I'm still enhancing the code and once it's sufficiently robust, I intend to move the interpreter to the Tekton Results and open a pull request making a more concrete proposal.

@tekton-robot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@alan-ghelardi
Copy link
Contributor Author

/remove-lifecycle stale

@alan-ghelardi
Copy link
Contributor Author

Solved via #404
/close

@tekton-robot
Copy link

@alan-ghelardi: Closing this issue.

In response to this:

Solved via #404
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@github-project-automation github-project-automation bot moved this from Todo to Done in Tekton Community Roadmap Jun 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
Status: Done
Development

No branches or pull requests

2 participants