Big Data Project - TV production company

Introduction

This project focuses on processing and analyzing large datasets for a television production company. The data, exceeding 20 million records, originates from diverse sources:

User contract information and interaction data (TXT files)
User log watching history (JSON files)
User log search history (Parquet files)

Data is retrieved from various storage solutions, including MySQL, Azure SQL, and the local file system. Subsequently, it undergoes transformation and organization into structured insight tables within a PostgreSQL database.

Data Snapshots

User Contract & Interaction Data

User Watch History Data

User Log Search Data

Technologies

AzureSQL
MySQL
Python
Apache Spark
PostgreSQL

Project Files

etl_log_content.py: Ingests, transforms, and loads watch history data
etl_log_search.py: Ingests, transforms, and loads log search data
user_analysis.ipynb: Analyzes user behavior from contract information and interaction data
mysql_azuresql_connector_template.ipynb: PySpark templates for connecting data sources and destinations

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Big Data Project - TV production company

Table of Contents

Introduction

Data Snapshots

User Contract & Interaction Data

User Watch History Data

User Log Search Data

Technologies

Project Files

Files

README.md

Latest commit

History

README.md

File metadata and controls

Big Data Project - TV production company

Table of Contents

Introduction

Data Snapshots

User Contract & Interaction Data

User Watch History Data

User Log Search Data

Technologies

Project Files