-
Notifications
You must be signed in to change notification settings - Fork 701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(athena): fixed write_iceberg which could fire a ICEBERG_TOO_MANY_OPEN_PARTITIONS Athena error #2908
Conversation
…OPEN_PARTITIONS Athena error
bb9b67a
to
3d72079
Compare
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
8381bc0
to
bc4a6f2
Compare
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
0cd9098
to
7be568b
Compare
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
Feature or Bugfix
Detail
Relates
Details
This is not really a bugfix given that this is a problem with Athena, not aws-sdk-pandas.
When you try to write an Iceberg table with Athena with partition keys, if there is more than 100 different created partitions, then Athena will fail with error ICEBERG_TOO_MANY_OPEN_PARTITIONS (see post here). This is really difficult to use Iceberg at scale with Athena with this limitation. One workaround is of course to use Spark, but I think it would be great if aws-sdk-for-pandas included a workaround.
So you will find my approach to a solution to this problem, simply chunk the dataframe to write to include at maximum 100 different partition keys combinations and write them sequentially using Athena.
Also edited unit tests to make them work.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.