Fake Chipotle Streaming

Demo project to mess around.

Project Overview

Initially, fake chipotle order data is generated via Go using the script under /src. Within the script we serialize the order into JSON and push to a predefined Kafka topic orders. From there, we grab the topic within Spark, parse the data back to JSON (since must be []byte to send via Kafka) and explode nested columns. Finally, we aggregate order items on a 5 minute basis and push the micro-batch result to a table in Cassandra.

This workflow lighlty ressembles a real-world real-time (ish) reporting scenario. Ideally there would be some sort of BI tool that sits on top of Cassandra to query the data. Additionally, the configuration and replication of the services is overly simplified since I did this project on a singular DigitalOcean droplet.

Architecture Overview

+---------+     +---------+     +---------+     +-------------+
|         | --> |         |     |         |     |             |
|  ORDER  | --> |  KAFKA  | --> |  SPARK  | --> |  CASSANDRA  |
|         | --> |         |     |         |     |             |
+---------+     +---------+     +---------+     +-------------+

Installation

Up & Running

Create Kafka Topic

kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partition 1 --topic orders

Set Up Cassandra

create keyspace chipotle with replication = {'class': 'SimpleStrategy', 'replication_factor': 1};

create table top_orders_every_5_mins (
    start timestamp,
    end timestamp,
    name text,
    count int,
    primary key (start, name)
);

Submit Spark Job + Streaming

make spark

Create Fake Order Data

cd src && go run *.go

Data

Chipotle Order Schema

{
  "time": "2022-09-14 22:03:15",
  "customer": {
    "name": "Yoshiko McGlynn",
    "age": 43,
    "city": "Chula Vista",
    "state": "Texas"
  },
  "order": [
    {
      "name": "Patron Margarita",
      "price": 7.15
    },
    {
      "name": "Salad (Chicken)",
      "price": 6.5
    }
  ],
  "creditCard": "Mastercard"
}

Versions

Spark -> v 3.2.0
Scala -> v 2.12.15
SBT -> v 1.7.1
Cassandra -> v 4.0.5
Kafka -> v 3.2.1
Go -> v 1.18.1

Reference

Structured Streaming + Kafka Integration

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
aggregate-orders		aggregate-orders
src		src
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fake Chipotle Streaming

Project Overview

Architecture Overview

Installation

Up & Running

Data

Chipotle Order Schema

Versions

Reference

About

Languages

License

kxzk/fake-chipotle-streaming

Folders and files

Latest commit

History

Repository files navigation

Fake Chipotle Streaming

Project Overview

Architecture Overview

Installation

Up & Running

Data

Chipotle Order Schema

Versions

Reference

About

Topics

Resources

License

Stars

Watchers

Forks

Languages