SoberEats is a leading food delivery application in the United States, known for its seamless interface and personalized recommendations. This project aims to build a robust Data Warehouse to enable detailed analytics, improving data-driven decision-making and enhancing customer experience. By integrating a Data Warehouse with SoberEats, the project aims to streamline operations and generate valuable insights for strategic business decisions.
- Create a Data Warehouse to store comprehensive historical data from SoberEats.
- Enable analysis of order patterns, delivery efficiency, customer preferences, and more.
- Use the data for operational insights and strategic business decisions.
- Implement ETL (Extract, Transform, Load) pipelines for data processing.
- Provide dashboards and reports using OLAP queries for visualization.
- A relational model storing details of customers, restaurants, delivery agents, orders, and menu items.
- Each customer is uniquely identified and categorized into:
- Regular Account: Standard features.
- Premium Account: Provides discount codes with expiry dates.
- Orders are linked to delivery agents, who have detailed information stored in the database.
- Designed as a Relational OLAP (ROLAP) model.
- Stores historical data for analysis, including:
- Menu Items: Aggregated information on trending dishes and pricing.
- Order Patterns: Analysis of peak order times, delivery efficiency, and agent performance.
- Customer Behavior: User preferences, interaction data, and order history.
- Operational Database: Implemented in PostgreSQL.
- Data Warehouse: A separate PostgreSQL database created to store aggregated data.
- ETL Process: Utilized Talend for ETL implementation, loading data from the operational database to the Data Warehouse.
- Country, State, City, and Time dimensions loaded.
- Automated using Talend control flows.
- Loaded dimensions for agents, accounts, addresses, restaurants, and menu items.
- Implemented using Slowly Changing Dimensions (SCD) Type 1, 2, and 3.
- Talend used for ETL with Type 2 updates, capturing changes over time.
- Incremental load of order data using Talend.
- Captured detailed order information, linking it to surrogate keys for dimensions.
- Incremental load handled using Talend context variables.
- Total Order Amount by Month
- Aggregates total order amounts per month.
- Average Sales per Year
- Calculates the average sales of all order items per year.
- Top 10 Items by Total Order Amount
- Ranks items based on total order amount.
- Average Delivery Time per Month
- Provides insights on delivery efficiency.
Tableau was used to visualize the insights derived from the Data Warehouse. Key dashboards include:
- Total Order Amount by Month: Displays monthly order trends.
- Top 10 Highest Selling Restaurants: Analyzes restaurant performance by order count.
- Sales by City: Highlights the highest sales regions.
- Average Delivery Time: Shows the delivery efficiency trend over time.
- Order Trends: Total sales and average delivery time analysis.
- Customer Behavior: Insights into user preferences and interaction data.
- Restaurant Performance: Analysis of popular dishes, peak order times, and restaurant efficiency.
- Database: PostgreSQL
- ETL Tool: Talend
- Visualization: Tableau
- Data Warehouse Model: MultiDim model and logical model
- Data Modeling: Relational OLAP
- Handling data consistency during incremental loads.
- Implementing various Slowly Changing Dimensions (SCD) types for different dimensions.
- Integrating OLAP queries with Tableau for comprehensive visualization.
- Advanced Analytics: Implement predictive models for customer behavior analysis.
- Enhanced ETL Pipelines: Optimize ETL workflows for faster data processing.
- Scalability: Migrate to cloud-based data warehousing solutions for improved scalability.