Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support pagination on task_history #15047

Merged
merged 3 commits into from
Mar 21, 2024

Conversation

ZhiHanZ
Copy link
Collaborator

@ZhiHanZ ZhiHanZ commented Mar 20, 2024

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

Previously when single request return body is large, it would show grpc error, this Pull Request support the following things:

  1. task_name, schedule_time filter push down, allow to use those select information to filter unneeded rows on cloud control services.
  2. add next_page_token and page_size parameter, allow to control page size and pagination of the single request.

NOTICE: I also tried to use limit push_down, but found that it does NOT work, need some deep dive on the issue

fix #14929

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

@github-actions github-actions bot added the pr-feature this PR introduces a new feature to the codebase label Mar 20, 2024
@ZhiHanZ ZhiHanZ requested a review from hantmac March 20, 2024 11:09
@BohuTANG
Copy link
Member

NOTICE: I also tried to use limit push_down, but found that it does NOT work, need some deep dive on the issue

We should make it works:) Think a user has many tasks all scheduler is in 1 second, the rows is very large, and run:
select * from system.task_history is no sense anymore.

So, here the better way is:

  1. select * from system.task_history returns rows with a LIMIT(Say 10000)
  2. select * from system.task_history limit N returns N rows order by the time.

@ZhiHanZ
Copy link
Collaborator Author

ZhiHanZ commented Mar 21, 2024

NOTICE: I also tried to use limit push_down, but found that it does NOT work, need some deep dive on the issue

We should make it works:) Think a user has many tasks all scheduler is in 1 second, the rows is very large, and run: select * from system.task_history is no sense anymore.

So, here the better way is:

  1. select * from system.task_history returns rows with a LIMIT(Say 10000)
  2. select * from system.task_history limit N returns N rows order by the time.

first limitation is easy to limit on server side, but for the second part, we need the push_down_info struct to pass limit number to the storage table source, but for now it does not work

https://github.com/datafuselabs/databend/pull/15047/files#diff-d4c61e93b55aeeb6dc42f6eef504cd58c6bdf2e3337d6be71aeeec07c4365973R145-R148

@BohuTANG
Copy link
Member

NOTICE: I also tried to use limit push_down, but found that it does NOT work, need some deep dive on the issue

We should make it works:) Think a user has many tasks all scheduler is in 1 second, the rows is very large, and run: select * from system.task_history is no sense anymore.
So, here the better way is:

  1. select * from system.task_history returns rows with a LIMIT(Say 10000)
  2. select * from system.task_history limit N returns N rows order by the time.

first limitation is easy to limit on server side, but for the second part

This limitation is ok if we give a default number for it.

@ZhiHanZ ZhiHanZ requested a review from flaneur2020 March 21, 2024 02:05
Copy link
Member

@hantmac hantmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@ZhiHanZ ZhiHanZ enabled auto-merge March 21, 2024 02:27
@ZhiHanZ ZhiHanZ added this pull request to the merge queue Mar 21, 2024
@BohuTANG BohuTANG removed this pull request from the merge queue due to a manual request Mar 21, 2024
@BohuTANG BohuTANG merged commit 341fb34 into databendlabs:main Mar 21, 2024
71 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: [1002=>status: OutOfRange] querying the system.task_history table failed
3 participants