-
Notifications
You must be signed in to change notification settings - Fork 342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CDAP-21087] Optimize spanner create table operations #15743
base: develop
Are you sure you want to change the base?
Conversation
cdap-data-fabric/src/main/java/io/cdap/cdap/spi/data/sql/PostgreSqlStructuredTableAdmin.java
Outdated
Show resolved
Hide resolved
cdap-data-fabric/src/main/java/io/cdap/cdap/spi/data/sql/PostgreSqlStructuredTableAdmin.java
Outdated
Show resolved
Hide resolved
cdap-data-fabric/src/main/java/io/cdap/cdap/store/StoreDefinition.java
Outdated
Show resolved
Hide resolved
@@ -34,6 +36,8 @@ public final class StoreDefinition { | |||
|
|||
private static final Logger LOG = LoggerFactory.getLogger(StoreDefinition.class); | |||
|
|||
private static List<String> batchCreateStatements; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why a list here? DDLs are being created in Spanner class, so having the list here requires these extra methods and it's a bit error-prone.
How about we add two new methods in the StructuredTableAdmin: isBatchCreateRequired() and batchCommit().
So all we need to do here in StoreDefinition is to call tableAdmin.commit() at the end if tableAdmin.isBatchCreateRequired() returns true.
Now in Spanner implementation, create calls will only store DDLs statements in a list, and upon batchCommit(), we execute all those DDLs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion Masoud.
I have moved the ddl statement retention logic in Spanner storage impl.
Regarding the create calls, I think we should keep this create & defer create method different.
This was also highlighted in this review comment by Sanket #15743 (comment)
Create() usually gives the perception that the method itself handles the table creation and if we just persist ddl statements here, it is prone to mistakes like batchCommit() not being called later.
If deferCreate() is used, developer will know, we should always call batchCommit().
I have added another method similar to existing updateOrCreate - the new method is named updateOrDeferCreate.
This keeps the method name & its purpose clear.
Please take a look and let me know what you think.
Quality Gate failedFailed conditions |
Table creation for Cloud Spanner is currently taking around 5-6 mins. This is adding delays when service pods come up and this in turn increases CDAP instance creation time. The table creation time in case of Cloud SQL is just 3s.
Current logs:
This PR aims to optimize the spanner table creation.
UpdateDDL statements are costlier in cloud spanner than cloud sql and it is recommended that we should try to batch these statements and execute together rather than individual calls for every create().
Batching is not needed in case of other supported Storage types. So we should not update the current flow for them.
End result: With these changes, table creation time has reduced to 1min, for the pod which does the major heavy lifting. Other pods are up in 20-40s.
Add unit test: