The steps in this section describe how to deploy the user defined functions (UDFs) jar file and configure the UDF scripts.
- Prerequisites
- Download the JAR file
- Create Exasol Bucket
- Upload the JAR file to the bucket
- Create ETL UDFs Scrtips
- Running Exasol cluster with version 6.0 or later
You can download the latest assembled (with all dependencies included) JAR file from Github releases.
Additionally, you can build it from the source by following the build from source guide below. This will allow you to use latest commits that may not be released yet.
Clone the repository,
git clone https://github.com/exasol/cloud-storage-etl-udfs
cd cloud-storage-etl-udfs/
Create assembled jar file,
./sbtx assembly
The packaged jar file should be located at
target/scala-2.12/cloud-storage-etl-udfs-<VERSION>.jar
.
Next, you need to upload the jar file to a bucket in the Exasol bucket file system (BucketFS). This will allow us to reference the jar in UDF scripts.
Please see the section "The synchronous cluster file system BucketFS" in the EXASolution User Manual for more details about BucketFS.
For this guide we are going to use an example bucket named bucket1
.
Then, upload the jar file to the bucket bucket1
. However, before uploading the
jar, please make sure that the BucketFS ports are open. In this example, we use
the port number 2580
for http.
Upload the jar file using curl:
curl -X PUT -T cloud-storage-etl-udfs-<VERSION>.jar \
http://w:[email protected]:2580/bucket1/cloud-storage-etl-udfs-<VERSION>.jar
Please also check out Exasol BucketFS Explorer as an alternative option to upload jar file to buckets in BucketFS.
Run the following SQL commands to configure UDF scripts.
First, create a schema that will contain UDF scrtips.
CREATE SCHEMA ETL;
The following SQL statements create scripts that allow importing data from cloud storage system.
The first script, IMPORT_PATH
will be used as entry point when running the
import UDF.
OPEN SCHEMA ETL;
CREATE OR REPLACE JAVA SET SCRIPT IMPORT_PATH(...) EMITS (...) AS
%scriptclass com.exasol.cloudetl.scriptclasses.ImportPath;
%jar /buckets/bfsdefault/bucket1/cloud-storage-etl-udfs-<VERSION>.jar;
/
CREATE OR REPLACE JAVA SET SCRIPT IMPORT_FILES(...) EMITS (...) AS
%scriptclass com.exasol.cloudetl.scriptclasses.ImportFiles;
%jar /buckets/bfsdefault/bucket1/cloud-storage-etl-udfs-<VERSION>.jar;
/
CREATE OR REPLACE JAVA SCALAR SCRIPT IMPORT_METADATA(...)
EMITS (filename VARCHAR(200), partition_index VARCHAR(100)) AS
%scriptclass com.exasol.cloudetl.scriptclasses.ImportMetadata;
%jar /buckets/bfsdefault/bucket1/cloud-storage-etl-udfs-<VERSION>.jar;
/
For exporting data from Exasol tables to cloud storage filesystems, run these statements to create export UDF scrips.
OPEN SCHEMA ETL;
CREATE OR REPLACE JAVA SET SCRIPT EXPORT_PATH(...) EMITS (...) AS
%scriptclass com.exasol.cloudetl.scriptclasses.ExportPath;
%jar /buckets/bfsdefault/bucket1/cloud-storage-etl-udfs-<VERSION>.jar;
/
CREATE OR REPLACE JAVA SET SCRIPT EXPORT_TABLE(...) EMITS (ROWS_AFFECTED INT) AS
%scriptclass com.exasol.cloudetl.scriptclasses.ExportTable;
%jar /buckets/bfsdefault/bucket1/cloud-storage-etl-udfs-<VERSION>.jar;
/
Similarly, run the following SQL statements in order to create Exasol UDF scripts to import data from Kafka cluster.
OPEN SCHEMA ETL;
CREATE OR REPLACE JAVA SET SCRIPT KAFKA_PATH(...) EMITS (...) AS
%scriptclass com.exasol.cloudetl.scriptclasses.KafkaPath;
%jar /buckets/bfsdefault/bucket1/cloud-storage-etl-udfs-<VERSION>.jar;
/
CREATE OR REPLACE JAVA SET SCRIPT KAFKA_IMPORT(...) EMITS (...) AS
%scriptclass com.exasol.cloudetl.scriptclasses.KafkaImport;
%jar /buckets/bfsdefault/bucket1/cloud-storage-etl-udfs-<VERSION>.jar;
/
CREATE OR REPLACE JAVA SET SCRIPT KAFKA_METADATA(
params VARCHAR(2000),
kafka_partition DECIMAL(18, 0),
kafka_offset DECIMAL(36, 0)
)
EMITS (partition_index DECIMAL(18, 0), max_offset DECIMAL(36,0)) AS
%scriptclass com.exasol.cloudetl.scriptclasses.KafkaMetadata;
%jar /buckets/bfsdefault/bucket1/cloud-storage-etl-udfs-<VERSION>.jar;
Please do not forget to change the bucket name or the latest jar version according to your deployment setup.