YouTube Data Harvesting and Warehousing using SQL, MongoDB and Streamlit
SOCIAL MEDIA
The problem statement is to create a Streamlit application that allows users to access and analyze data from multiple YouTube channels
- Streamlit is a great choice for building data visualization and analysis tools quickly and easily.
- We can use Streamlit to create a simple UI where users can enter a YouTube channel ID, view the channel details, and select channels to migrate to the data warehouse.
- We will need to use the YouTube API to retrieve channel and video data.
- we can use the Google API client library for Python to make requests to the API.
- After retrieving the data from the YouTube API, we should store it in a MongoDB data lake.
- MongoDB is a great choice for a data lake because it can handle unstructured and semi-structured data easily.
- After collecting data of multiple channels, we should migrate it to a SQL data warehouse.
- We can use a SQL database such as MySQL or PostgreSQL for this.Here we have used MySQL
- We can use SQL queries to join the tables in the SQL data warehouse and retrieve data for specific channels based on user input.
- We can use a Python SQL library such as SQLAlchemy to interact with the SQL database.
- Finally,we display the retrieved data in the Streamlit app.
- We can use Streamlit's data visualization features to create charts and graphs to help users analyze the data.
TOOLS:
- YouTube API Key
- Python 3.11.0 or higher.
- PyCharm
- MySQL
- MongoDB
- Streamlit
SKILLS:
- Python scripting
- API integration
- Data Collection
- Data Management using MongoDB and SQL
- Obtain the channel id from the channel for which the data to retrieved.
COLLECT TAB
- Enter the channel id in the Enter the Channel id input box in COLLECT TAB
- After entering the channel id, click on the Retrieve and Store data button to retrieve and store data to MongoDB.
EXTRACT TAB
- The retrieved channel name appears in the dropdown in the extract tab.
- Select the channel
- Click on the Migrate to MySQL button.
- Data will be migrated from MongoDB to MySQL.
- Click on the check box to check the updation of the channel
- On clicking the check box, it will display the channels stored.
- Queries were displayed in the dropdown
- Select the queries in the dropdown
- Results were displayed based on the analysis done.
Developing a user-friendly Streamlit application that enables users to search for channel details and join tables to view data in the Streamlit app.
DEMO VIDEO : https://www.linkedin.com/feed/update/urn:li:ugcPost:7072515437875351552/