Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify Hive external table scan scheduling strategy #1394

Merged
merged 2 commits into from
Nov 23, 2021

Conversation

stdpain
Copy link
Contributor

@stdpain stdpain commented Nov 18, 2021

In the previous scan scheduling, if there were multiple partitions, each partition would be scanned equally,
and although this would allow each scanner to be scheduled very equally, it would also result in too many HDFS files being open at the same time.
In some scenarios several thousand HDFS handles may be held at the same time. It will increase the load on HDFS.

Now we will limit max_hdfs_file_instance in hive external table, if opened file greater than this. scanner will push back to pending list

if opened file greater than this. scanner will push back to pending list. We can't have all scanners in the pending state, we need to make sure there is at least one thread on each SCAN NODE that can be running

@stdpain stdpain changed the title [WIP] Modify Hive external table scan scheduling logic Modify Hive external table scan scheduling logic Nov 21, 2021
@stdpain stdpain changed the title Modify Hive external table scan scheduling logic Modify Hive external table scan scheduling strategy Nov 22, 2021
In the previous scan scheduling, if there were multiple partitions, each partition would be scanned equally,
and although this would allow each scanner to be scheduled very equally, it would also result in too many HDFS files being open at the same time.
In some scenarios several thousand HDFS handles may be held at the same time. It will increase the load on HDFS.
dirtysalt
dirtysalt previously approved these changes Nov 22, 2021
be/src/common/config.h Outdated Show resolved Hide resolved
@stdpain
Copy link
Contributor Author

stdpain commented Nov 22, 2021

run starrocks_be_unittest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants