Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#5557] improvement(CI): Add some docs and tests about how to use Azure Blob Storage(ADLS) in Hive #5558

Merged
merged 19 commits into from
Nov 29, 2024

Conversation

yuqi1129
Copy link
Contributor

What changes were proposed in this pull request?

Add some tests to demonstrate how to use ADLS in Hive.

Why are the changes needed?

To verify if we can use ADLS in Hive.
Fix: #5557

Does this PR introduce any user-facing change?

N/A

How was this patch tested?

Test manually.

@yuqi1129
Copy link
Contributor Author

This PR depends on #5553

@yuqi1129 yuqi1129 closed this Nov 12, 2024
@yuqi1129 yuqi1129 reopened this Nov 12, 2024
@yuqi1129 yuqi1129 changed the title [#5557] improvement(CI): Add some tests for ADLS in Hive [#5557] improvement(CI): Add some tests about how to use ADLS in Hive Nov 12, 2024
@yuqi1129 yuqi1129 requested a review from jerryshao November 20, 2024 13:45
@yuqi1129 yuqi1129 self-assigned this Nov 20, 2024
@yuqi1129 yuqi1129 changed the title [#5557] improvement(CI): Add some tests about how to use ADLS in Hive [#5557] improvement(CI): Add some docs about how to use ADLS in Hive Nov 21, 2024
@yuqi1129 yuqi1129 changed the title [#5557] improvement(CI): Add some docs about how to use ADLS in Hive [#5557] improvement(CI): Add some docs and tests about how to use ADLS in Hive Nov 21, 2024
@jerryshao
Copy link
Contributor

Do you need to update the PR based on #5630 ?

@yuqi1129
Copy link
Contributor Author

Do you need to update the PR based on #5630 ?

No, #5630 need to be polished and more effect.

@yuqi1129 yuqi1129 changed the title [#5557] improvement(CI): Add some docs and tests about how to use ADLS in Hive [#5557] improvement(CI): Add some docs and tests about how to use Azure Blob Storage(ADLS) in Hive Nov 26, 2024
Comment on lines 134 to 135
// You need this to run test CatalogHiveABSIT
// testImplementation(libs.hadoop3.common)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment still valid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First for Auzre Blob Storage test, it requres hadoop3 common or the following error will occur:

java.lang.NoClassDefFoundError: org/apache/hadoop/security/ssl/DelegatingSSLSocketFactory$SSLChannelMode
	at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.<init>(AzureBlobFileSystemStore.java:151)
	at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.initialize(AzureBlobFileSystem.java:108)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3261)
	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:121)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3310)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3278)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:475)
	at org.apache.gravitino.catalog.hive.integration.test.CatalogHiveABSIT.initFileSystem(CatalogHiveABSIT.java:74)
	at org.apache.gravitino.catalog.hive.integration.test.CatalogHiveIT.startup(CatalogHiveIT.java:215)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725)
	at org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
	at

If we do not comment the code, there will be conflicts between hadoop2 and hadoop3 common , resulting failures in other test.

So we need to comment the dependencies currently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain the reason why this dependency should be comment out, not just adding the confusing comment makes reviewer hard to understand.

@@ -11,6 +11,8 @@ license: "This software is licensed under the Apache License version 2."

Since Hive 2.x, Hive has supported S3 as a storage backend, enabling users to store and manage data in Amazon S3 directly through Hive. Gravitino enhances this capability by supporting the Hive catalog with S3, allowing users to efficiently manage the storage locations of files located in S3. This integration simplifies data operations and enables seamless access to S3 data from Hive queries.

For ADLS (or Azure Blob storage(abs), or Azure Data lake storage(v2)), the integration is similar to S3. The only difference is the configuration properties for ADLS. The following sections will guide you through the necessary steps to configure the Hive catalog to utilize S3 as a storage backend, including configuration details and examples for creating databases and tables.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"For ADLS (aka. Azure Blob Storage (ABS), or Azure Data Lake Storage (v2)), ..."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"S3 as a storage backend"

Is it S3 or ADLS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved.

docs/hive-catalog-with-s3-and-adls.md Show resolved Hide resolved
@yuqi1129 yuqi1129 closed this Nov 28, 2024
@yuqi1129 yuqi1129 reopened this Nov 28, 2024
@yuqi1129 yuqi1129 closed this Nov 28, 2024
@yuqi1129 yuqi1129 reopened this Nov 28, 2024
@jerryshao jerryshao merged commit e25a67a into apache:main Nov 29, 2024
50 of 55 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Improvement] Add tests about how to use ADLS in Hive
2 participants