Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Optimize for Redshift Serverless #633

Open
3 tasks done
jaswanthikolla opened this issue Jul 15, 2024 · 1 comment
Open
3 tasks done

[Feature] Optimize for Redshift Serverless #633

jaswanthikolla opened this issue Jul 15, 2024 · 1 comment
Labels
pkg:dbt-redshift Issue affects dbt-redshift type:enhancement New feature request

Comments

@jaswanthikolla
Copy link

jaswanthikolla commented Jul 15, 2024

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt-redshift functionality, rather than a Big Idea better suited to a discussion

Describe the feature

In Redshift Serverless, Queries are billed for a minimum of 60 seconds, It's better to batch the queries. For example, let's say you are running a model with full dependency. You run system table queries like pg_namespace and information_schema.tables at T0, and those results are processed, and the model query runs at T1-T2. You are billed from T0 to T2 instead of just T1-T2, which includes a lot of IO time. This same thing is done for every model in the dependency chain.

The proposal is to do this system tables queries at the startup time itself while resolving dependency resolution so that they are queried, and when the actual models are run it's not queried again ( Which makes the Redshift to wait).

Pulled from other Issues:

Describe alternatives you've considered

Multiple Workspaces with different RPU, but it's outside the scope of DBT.

Who will this benefit?

All redshift serverless users, This can save millions of dollars across industry.

Are you interested in contributing this feature?

I am 3 days into DBT, But Yes I can!

Anything else?

May be you can take this next level and use SQLLite to cache the system tables info locally.

@jaswanthikolla jaswanthikolla added type:enhancement New feature request triage:product In Product's queue labels Jul 15, 2024
@amychen1776 amychen1776 removed the triage:product In Product's queue label Jul 25, 2024
@amychen1776
Copy link
Contributor

@jaswanthikolla Thank you so much for opening the three issues! And welcome to dbt :) In the future, feel free to group these similar requests together!

@amychen1776 amychen1776 changed the title [Feature] Batch the System table Queries to Optimize for Serverless [Feature] Optimize for Serverless Jul 25, 2024
@amychen1776 amychen1776 changed the title [Feature] Optimize for Serverless [Feature] Optimize for Redshift Serverless Aug 28, 2024
@mikealfare mikealfare added the pkg:dbt-redshift Issue affects dbt-redshift label Jan 15, 2025
@mikealfare mikealfare transferred this issue from dbt-labs/dbt-redshift Jan 23, 2025
mikealfare pushed a commit that referenced this issue Jan 23, 2025
* Clean up changelog on main

* Bumping version to 1.8.0a1

* Code quality cleanup
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pkg:dbt-redshift Issue affects dbt-redshift type:enhancement New feature request
Projects
None yet
Development

No branches or pull requests

3 participants