-
Notifications
You must be signed in to change notification settings - Fork 401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#5557] improvement(CI): Add some docs and tests about how to use Azure Blob Storage(ADLS) in Hive #5558
Conversation
This PR depends on #5553 |
Do you need to update the PR based on #5630 ? |
// You need this to run test CatalogHiveABSIT | ||
// testImplementation(libs.hadoop3.common) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment still valid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First for Auzre Blob Storage test, it requres hadoop3 common or the following error will occur:
java.lang.NoClassDefFoundError: org/apache/hadoop/security/ssl/DelegatingSSLSocketFactory$SSLChannelMode
at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.<init>(AzureBlobFileSystemStore.java:151)
at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.initialize(AzureBlobFileSystem.java:108)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3261)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:121)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3310)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3278)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:475)
at org.apache.gravitino.catalog.hive.integration.test.CatalogHiveABSIT.initFileSystem(CatalogHiveABSIT.java:74)
at org.apache.gravitino.catalog.hive.integration.test.CatalogHiveIT.startup(CatalogHiveIT.java:215)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725)
at org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
at
If we do not comment the code, there will be conflicts between hadoop2 and hadoop3 common , resulting failures in other test.
So we need to comment the dependencies currently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explain the reason why this dependency should be comment out, not just adding the confusing comment makes reviewer hard to understand.
@@ -11,6 +11,8 @@ license: "This software is licensed under the Apache License version 2." | |||
|
|||
Since Hive 2.x, Hive has supported S3 as a storage backend, enabling users to store and manage data in Amazon S3 directly through Hive. Gravitino enhances this capability by supporting the Hive catalog with S3, allowing users to efficiently manage the storage locations of files located in S3. This integration simplifies data operations and enables seamless access to S3 data from Hive queries. | |||
|
|||
For ADLS (or Azure Blob storage(abs), or Azure Data lake storage(v2)), the integration is similar to S3. The only difference is the configuration properties for ADLS. The following sections will guide you through the necessary steps to configure the Hive catalog to utilize S3 as a storage backend, including configuration details and examples for creating databases and tables. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"For ADLS (aka. Azure Blob Storage (ABS), or Azure Data Lake Storage (v2)), ..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"S3 as a storage backend"
Is it S3 or ADLS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved.
What changes were proposed in this pull request?
Add some tests to demonstrate how to use ADLS in Hive.
Why are the changes needed?
To verify if we can use ADLS in Hive.
Fix: #5557
Does this PR introduce any user-facing change?
N/A
How was this patch tested?
Test manually.