Publish/subscribe messaging, or pub/sub messaging, is a form of asynchronous service-to-service communication used in serverless and microservices architectures. In a pub/sub model, any message published to a topic is immediately received by all of the subscribers to the topic. Pub/sub messaging can be used to enable event-driven architectures, or to decouple applications in order to increase performance, reliability and scalability.
In this hands-on Beginners Machine Learning online workshop, you will learn how to listen to live feeds, parse and save the received messages.
During the workshop we will listen to Network Rail's live train describer data a Jupyter notebook. We first will look at how one can listen to these messages, then how we can define our own behaviours to save and parse messages received.
- How can one listen to live data feeds
- How can one save messages received from live data feeds
- What are the differences between pub-sub feeds and APIs
- What is JSON and how can one parse it into tabular format
This workshop consists of 3 lessons and 4 projects:
LESSONS
- Pub-Sub messaging systems and how to subscribe to one
- Saving messages from live feeds as you listen to them
- Parsing JSON semi-structured messages into a tabular structure
PROJECTS
- Creating a fully featured pub-sub command line application that works with any Network Rail data feed
- Saving pub-sub messages on a computer that has limited memory
- Saving pub-sub messages in an organised format
- Writing algorithms to parsing and save messages from different Network Rail data feeds in tabular format
Youtube Video tutorial is currently being filmed and edited. Please check back later for a link to the video.
While you won't need prior experience with pub-sub systems, we will assume basic programming experience with Python and package/environment managers such as pip, conda or pipenv.
- Conda installed locally (https://docs.anaconda.com/anaconda/install/)
- Jupyter Notebook installed locally (https://jupyter.readthedocs.io/en/latest/install.html)
For refresher on python programming we recommend the following free course:
- Python programming language: See Udacity - Intro to Python
Use the following guides to setup your development environment
- If using Conda
- Create the right environment for the project using conda (see prerequisite No. 1) use
conda env create -f environment.yaml
- Activate the env:
conda activate bml-live-feed
- If using VirtualEnv
-
Install your dependencies
pip install -r requirements.txt
-
Add virtual env to jupyter notebook kernel
python -m ipykernel install --user --name=${environment_name}
this should print "Installed kernelspec ${environment_name} in ${dir} -
Go to /notebooks/live-feed-exercise.ipynb and select
kernel/change kernel/${environment_name}
(every time you open a new lesson you're likely to have to select your kernel again) -
You're ready to go!
- Clone this git repository using
git clone https://github.com/beginners-machine-learning-london/intro-live-data-feeds.git
or download the repo as a zip file to get started - Setup your development environment using conda or pipenv using the
requirements.txt
file. - Listen to lectures then work your way through the notebooks in
notebooks
folder. - Complete the project sections in the notebooks after the workshop.
- Join our slack group to get access to this workshop's private channel so that you can ask questions and connect with your classmates
- Submit the github repo link to your completed projects on our website for grading and a chance to earn a course certificate
IMPORTANT NOTE: Attempt to complete the projects by yourself using the youtube tutorial and googling online. If you get stuck and cannot progress any further, then take a look at the solutions in the
solutions
folder
- Python: Python is a programming language that lets you work more quickly and integrate your systems more effectively.
- Pandas: pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.
- Stomp: Stomp.py is a Python library providing access to a message broker using the STOMP protocol - either programmatically or using a command line client.
- Stomp.py Documentation - Use this resource as the main resource for this class.
- STOMP Protocol Specification, Version 1.2 - Use this resource to understand more about messaging systems that use the STOMP protocol.
- Network Rail Open Data Feeds - Refer to this resource to understand what data feeds are available to subscribe to.
- Open Rail Data Feeds Wiki - Check out this resource which has information on the Open Data available from the rail industry in Great Britain.
- BML Slack Channel - Join our slack workspace to collaborate with others, discuss ideas and post any questions you have about our group or the workshops
- How was this workshop? Please provide us with some feedback here so that we can improve the content and delivery of future workshops.