- Creating a data stream flow on Kinesis Data Streams.
- Developing Python applications (ipynb files) to perform data streaming.
- Creating a delivery stream through Kinesis Data Firehose and creating a destination bucket in S3.
- Running applications to populate the S3 bucket with generated data.
- Creating an access function in IAM.
- Creating a database in AWS Glue.
- Configuring the Crawler to transform JSON file data generated by applications into tables.
- Creating a new bucket to receive the tables.
- Creating a job in Glue Job.
- Configuring the Job. Selecting AWS Glue Data Catalog, selecting the database and table. Transforming data and schema.
- Creating the target to be the bucket created in S3, selecting the Parquet format and Snappy compression.
- Running the Job.
- Configuring Athena to send queries to a new S3 bucket and select the database in the editor.
- Querying data with Athena.
-
Notifications
You must be signed in to change notification settings - Fork 0
edrrezend/ETL_Streaming_DataLake
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
ETL using application streaming and creating a Data Lake
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published