Skip to content

MarcusLe02/big-data-tv-production

Repository files navigation

Big Data Project - TV production company

Table of Contents

Introduction

This project focuses on processing and analyzing large datasets for a television production company. The data, exceeding 20 million records, originates from diverse sources:

  • User contract information and interaction data (TXT files)
  • User log watching history (JSON files)
  • User log search history (Parquet files)

Data is retrieved from various storage solutions, including MySQL, Azure SQL, and the local file system. Subsequently, it undergoes transformation and organization into structured insight tables within a PostgreSQL database.

Data Snapshots

User Contract & Interaction Data

User Contract Inforamtion & Interaction Data

User Watch History Data

User Watch History Data

User Log Search Data

User Log Search Data

Technologies

  • AzureSQL
  • MySQL
  • Python
  • Apache Spark
  • PostgreSQL

Project Files

About

Big Data processing pipeline for TV production company

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published