Skip to content

Latest commit

 

History

History
98 lines (67 loc) · 5.33 KB

File metadata and controls

98 lines (67 loc) · 5.33 KB

User Guide

This user guide will help you to get started with the Azure Data Lake Storage Gen 2 (ADLS) File Virtual Schema.

Installation

Upload the latest available release of this adapter to BucketFS. See the Exasol documentation for details: Create a bucket in BucketFS, Access Files in BucketFS.

Then create a schema to hold the adapter script.

CREATE SCHEMA ADAPTER;

Next create the Adapter Script:

CREATE OR REPLACE JAVA ADAPTER SCRIPT ADAPTER.AZURE_DATALAKE_STORAGE_GEN2_FILES_ADAPTER AS
   %scriptclass com.exasol.adapter.RequestDispatcher;
   %jar /buckets/bfsdefault/default/document-files-virtual-schema-dist-8.1.5-azure-datalake-storage-gen2-2.1.2.jar;
/

In addition to the adapter script you need to create a UDF function that will handle the loading of the data:

CREATE OR REPLACE JAVA SET SCRIPT ADAPTER.IMPORT_FROM_AZURE_DATA_LAKE_STORAGE_GEN2_DOCUMENT_FILES(
  DATA_LOADER VARCHAR(2000000),
  SCHEMA_MAPPING_REQUEST VARCHAR(2000000),
  CONNECTION_NAME VARCHAR(500))
  EMITS(...) AS
    %scriptclass com.exasol.adapter.document.UdfEntryPoint;
    %jar /buckets/bfsdefault/default/document-files-virtual-schema-dist-8.1.5-azure-datalake-storage-gen2-2.1.2.jar;
/

Creating a Connection

For granting the Virtual Schema access to your Azure Data Lake Storage Gen 2 container you need the account name of your Azure Datalake-enabled Storage Account, a storage account key and the container name.

You can find your storage account's connection strings in the Azure portal. Navigate to SETTINGS > Access keys in your storage account's menu blade to see connection strings for both primary and secondary access keys. (More info here)

Now you need to define a connection that includes the contents of the key file:

CREATE CONNECTION ADLS_CONNECTION
   TO ''
   USER ''
   IDENTIFIED BY '{
       "adlsStorageAccountName":"myadslenabledstorageaccount",
       "adlsStorageAccountKey":"<key>",  
       "adlsContainerName":"<ADLS CONTAINER NAME>" 
   }';

The connection stores all connection details as JSON in the IDENTIFIED BY part. There you can use the following keys:

Key Default Required Example
adlsStorageAccountName "myadslenabledstorageaccount"
adlsStorageAccountKey "<key>"
adlsContainerName "my-container"

Defining the Schema Mapping

Before creating a Virtual Schema you need to create a mapping definition that defines how the document data is mapped to Exasol tables.

For that we use the Exasol Document Mapping Language (EDML). It is universal over all document Virtual Schemas. To learn how to define such EDML definitions check the user guide in the common repository for all document Virtual Schemas.

In the definitions you have to define the source property. Use something like my-folder/user.json here. The path is relative to the container's root.

This Virtual Schema adapter automatically detects the type of the document file by the file extension. You can find a list of supported file types and their extensions in the user guide of the common repository for all file Virtual Schemas.

Mapping Multiple Files

For some file type (for example JSON) each source file contains only a single document. That means, that you have one file for each row in the mapped table. To define mappings for such types, you can use the GLOB syntax. That means, you can use * and ? as wildcards, where * matches multiple characters and ? a single one.

Creating the Virtual Schema

Finally, create the Virtual Schema using:

CREATE VIRTUAL SCHEMA FILES_VS_TEST USING ADAPTER.AZURE_DATALAKE_STORAGE_GEN2_FILES_ADAPTER WITH
    CONNECTION_NAME = 'ADLS_CONNECTION'
    MAPPING         = '/bfsdefault/default/path/to/mappings/in/bucketfs';

The CREATE VIRTUAL SCHEMA command accepts the following properties:

Property Mandatory Default Description
MAPPING Yes Path to the mapping definition file(s) in BucketFS
MAX_PARALLEL_UDFS No -1 Maximum number of UDFs that are executed in parallel. -1 represents unlimited.

Now browse the data using your favorite SQL client.

Known Issues