Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(athena): fixed write_iceberg which could fire a ICEBERG_TOO_MANY_OPEN_PARTITIONS Athena error #2908

Conversation

erwan-simon
Copy link
Contributor

@erwan-simon erwan-simon commented Jul 22, 2024

Feature or Bugfix

Bugfix

Detail

Athena: fixed write_iceberg which could fire a ICEBERG_TOO_MANY_OPEN_PARTITIONS Athena error

Relates

None

Details

This is not really a bugfix given that this is a problem with Athena, not aws-sdk-pandas.

When you try to write an Iceberg table with Athena with partition keys, if there is more than 100 different created partitions, then Athena will fail with error ICEBERG_TOO_MANY_OPEN_PARTITIONS (see post here). This is really difficult to use Iceberg at scale with Athena with this limitation. One workaround is of course to use Spark, but I think it would be great if aws-sdk-for-pandas included a workaround.

So you will find my approach to a solution to this problem, simply chunk the dataframe to write to include at maximum 100 different partition keys combinations and write them sequentially using Athena.

Also edited unit tests to make them work.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@erwan-simon erwan-simon force-pushed the fix/iceberg_too_many_open_partitions_athena_error branch from bb9b67a to 3d72079 Compare July 22, 2024 14:58
@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
  • Commit ID: bb9b67a
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
  • Commit ID: 3d72079
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
  • Commit ID: 8381bc0
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
  • Commit ID: 8381bc0
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@erwan-simon erwan-simon force-pushed the fix/iceberg_too_many_open_partitions_athena_error branch from 8381bc0 to bc4a6f2 Compare July 22, 2024 17:30
@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
  • Commit ID: bc4a6f2
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
  • Commit ID: bc4a6f2
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@erwan-simon erwan-simon marked this pull request as draft July 24, 2024 09:38
@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
  • Commit ID: 0cd9098
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@erwan-simon erwan-simon force-pushed the fix/iceberg_too_many_open_partitions_athena_error branch from 0cd9098 to 7be568b Compare August 11, 2024 17:00
@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
  • Commit ID: 7be568b
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
  • Commit ID: 0cd9098
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
  • Commit ID: 7be568b
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@erwan-simon erwan-simon deleted the fix/iceberg_too_many_open_partitions_athena_error branch October 8, 2024 12:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants