This user guide will help you to get started with the Azure Data Lake Storage Gen 2 (ADLS) File Virtual Schema.
Upload the latest available release of this adapter to BucketFS. See the Exasol documentation for details: Create a bucket in BucketFS, Access Files in BucketFS.
Then create a schema to hold the adapter script.
CREATE SCHEMA ADAPTER;
Next create the Adapter Script:
CREATE OR REPLACE JAVA ADAPTER SCRIPT ADAPTER.AZURE_DATALAKE_STORAGE_GEN2_FILES_ADAPTER AS
%scriptclass com.exasol.adapter.RequestDispatcher;
%jar /buckets/bfsdefault/default/document-files-virtual-schema-dist-8.1.5-azure-datalake-storage-gen2-2.1.2.jar;
/
In addition to the adapter script you need to create a UDF function that will handle the loading of the data:
CREATE OR REPLACE JAVA SET SCRIPT ADAPTER.IMPORT_FROM_AZURE_DATA_LAKE_STORAGE_GEN2_DOCUMENT_FILES(
DATA_LOADER VARCHAR(2000000),
SCHEMA_MAPPING_REQUEST VARCHAR(2000000),
CONNECTION_NAME VARCHAR(500))
EMITS(...) AS
%scriptclass com.exasol.adapter.document.UdfEntryPoint;
%jar /buckets/bfsdefault/default/document-files-virtual-schema-dist-8.1.5-azure-datalake-storage-gen2-2.1.2.jar;
/
For granting the Virtual Schema access to your Azure Data Lake Storage Gen 2 container you need the account name of your Azure Datalake-enabled Storage Account, a storage account key and the container name.
You can find your storage account's connection strings in the Azure portal. Navigate to SETTINGS
> Access keys
in your storage account's menu blade to see connection strings for both primary and secondary access keys. (More info here)
Now you need to define a connection that includes the contents of the key file:
CREATE CONNECTION ADLS_CONNECTION
TO ''
USER ''
IDENTIFIED BY '{
"adlsStorageAccountName":"myadslenabledstorageaccount",
"adlsStorageAccountKey":"<key>",
"adlsContainerName":"<ADLS CONTAINER NAME>"
}';
The connection stores all connection details as JSON in the IDENTIFIED BY
part. There you can use the following keys:
Key | Default | Required | Example |
---|---|---|---|
adlsStorageAccountName |
✓ | "myadslenabledstorageaccount" |
|
adlsStorageAccountKey |
✓ | "<key>" |
|
adlsContainerName |
✓ | "my-container" |
Before creating a Virtual Schema you need to create a mapping definition that defines how the document data is mapped to Exasol tables.
For that we use the Exasol Document Mapping Language (EDML). It is universal over all document Virtual Schemas. To learn how to define such EDML definitions check the user guide in the common repository for all document Virtual Schemas.
In the definitions you have to define the source
property. Use something like my-folder/user.json
here. The path is relative to the container's root.
This Virtual Schema adapter automatically detects the type of the document file by the file extension. You can find a list of supported file types and their extensions in the user guide of the common repository for all file Virtual Schemas.
For some file type (for example JSON) each source file contains only a single document. That means, that you have one file for each row in the mapped table. To define mappings for such types, you can use the GLOB syntax. That means, you can use *
and ?
as wildcards, where *
matches multiple characters and ?
a single one.
Finally, create the Virtual Schema using:
CREATE VIRTUAL SCHEMA FILES_VS_TEST USING ADAPTER.AZURE_DATALAKE_STORAGE_GEN2_FILES_ADAPTER WITH
CONNECTION_NAME = 'ADLS_CONNECTION'
MAPPING = '/bfsdefault/default/path/to/mappings/in/bucketfs';
The CREATE VIRTUAL SCHEMA
command accepts the following properties:
Property | Mandatory | Default | Description |
---|---|---|---|
MAPPING |
Yes | Path to the mapping definition file(s) in BucketFS | |
MAX_PARALLEL_UDFS |
No | -1 | Maximum number of UDFs that are executed in parallel. -1 represents unlimited. |
Now browse the data using your favorite SQL client.