Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CDAP-21087] Optimize spanner create table operations #15743

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from

Conversation

sidhdirenge
Copy link
Contributor

@sidhdirenge sidhdirenge commented Nov 25, 2024

Table creation for Cloud Spanner is currently taking around 5-6 mins. This is adding delays when service pods come up and this in turn increases CDAP instance creation time. The table creation time in case of Cloud SQL is just 3s.

Current logs:

2024-11-13 13:09:52,307 - INFO  [main:i.c.c.m.e.k.StorageMain@69] - Creating storages
.
.
2024-11-13 13:14:21,632 - INFO  [main:i.c.c.m.e.k.StorageMain@113] - Storage creation completed

This PR aims to optimize the spanner table creation.
UpdateDDL statements are costlier in cloud spanner than cloud sql and it is recommended that we should try to batch these statements and execute together rather than individual calls for every create().

Batching is not needed in case of other supported Storage types. So we should not update the current flow for them.

End result: With these changes, table creation time has reduced to 1min, for the pod which does the major heavy lifting. Other pods are up in 20-40s.

Add unit test:
Screenshot 2024-12-03 at 11 59 57 AM

@sidhdirenge sidhdirenge self-assigned this Nov 28, 2024
@sidhdirenge sidhdirenge added the build Triggers github actions build label Dec 2, 2024
@@ -34,6 +36,8 @@ public final class StoreDefinition {

private static final Logger LOG = LoggerFactory.getLogger(StoreDefinition.class);

private static List<String> batchCreateStatements;
Copy link
Contributor

@masoud-io masoud-io Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a list here? DDLs are being created in Spanner class, so having the list here requires these extra methods and it's a bit error-prone.
How about we add two new methods in the StructuredTableAdmin: isBatchCreateRequired() and batchCommit().

So all we need to do here in StoreDefinition is to call tableAdmin.commit() at the end if tableAdmin.isBatchCreateRequired() returns true.

Now in Spanner implementation, create calls will only store DDLs statements in a list, and upon batchCommit(), we execute all those DDLs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion Masoud.

I have moved the ddl statement retention logic in Spanner storage impl.
Regarding the create calls, I think we should keep this create & defer create method different.
This was also highlighted in this review comment by Sanket #15743 (comment)

Create() usually gives the perception that the method itself handles the table creation and if we just persist ddl statements here, it is prone to mistakes like batchCommit() not being called later.
If deferCreate() is used, developer will know, we should always call batchCommit().

I have added another method similar to existing updateOrCreate - the new method is named updateOrDeferCreate.
This keeps the method name & its purpose clear.

Please take a look and let me know what you think.

@sidhdirenge sidhdirenge marked this pull request as ready for review December 3, 2024 06:37
@sidhdirenge sidhdirenge changed the title Optimize spanner create table operations [CDAP-21087] Optimize spanner create table operations Dec 3, 2024
Copy link

sonarcloud bot commented Dec 3, 2024

Quality Gate Failed Quality Gate failed

Failed conditions
65.6% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Triggers github actions build
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants