In this interesting 3hr workshop, you will take the massive dataset of UFO sightings (80,000 reports over the past century) from National UFO Reporting Center (NUFORC) and use Amazon's machine learning services (Sagemaker) to identify the top 10 locations that are most likely to have UFO sightings. To do so, you will need to use an unsupervised machine learning algorithm.
You will then take your trained model, deserialise it, convert its output to a csv format and visualise it on a map using AWS Quicksights to see where these locations are. Then you can try correlating these locations with landmarks.
- What areas of the world are most likely to have UFO sightings?
- Do clusters of UFO sightings correlate with landmarks, such as airports or research centers?
This repository contains instructions for setting up your AWS account to get you started with Amazon Sagemaker, S3 and Quicksight. Workshop assignments for are included in the exercises
folder.
- Learn the difference between Artificial Intelligence, Machine Learning and Deep learning
- Learn how K-means Clustering Algorithm works
- See unsupervised machine learning LIVE In the CLOUD!
- Learn the process of loading and preparing data for training
- Experiment with AWS Sagemaker, S3 and Quicksight
- Creating and configuring IAM roles and permissions in AWS
- Creating and configuring buckets in AWS
- Creating and configuring Jupyter notebooks in AWS
- Reading and Writing to your buckets via Jupyter Notebook
- Cleaning and Wrangling data with Pandas and Numpy packages
- Training models in the cloud using unsupervised machine learning and saving it
- Deserialising a trained model's output
- Setting up AWS Quicksight datasources
- Visualising the results of your model's output on a map using AWS QuickSight
- Most importantly, you will learn how to use AWS sagemaker to cluster your data using unsupervised machine learning algorithms
While you won't need prior experience in practical machine learning or with the AWS to follow along with this class, we'll assume some familiarity with:
- Python programming language: See Udacity - Intro to Python
- Getting Started with AWS: See AWS Getting Started Resource Center
- pandas and numpy python packages
This workshop consists of three activities:
- Setting up AWS accounts and S3 buckets
- Processing the dataset with AWS Sagemaker
- Visualising the model outputs with AWS Quicksight on a map
Note: If you get stuck, take a look at the solutions notebook. It is recommended that you solve your problem as far as you can via googling. Only refer to the solutions notebook if you are frustrated.
- Clone this git repository using
git clone https://github.com/beginners-machine-learning-london/intro_to_unsupervised_ml_with_AWS_Sagemaker
- Create an AWS account and set a budget before doing anything else
- Setup up your AWS user accounts and permissions (IAM User, Roles, Groups and Permissions) - Or use your root account to not deal with permissions. However, this is not AWS best practice.
- Create an S3 bucket and upload the UFO dataset onto your bucket
- Create an AWS Sagemaker notebook instance using the default options. You may need to create an AWS role while creating the instance.
- Clone this github repo into AWS Sagemaker's notebook instance
- Complete the Jupyter notebook assignments on AWS and save the model outputs in your S3 bucket
- Create an AWS Quicksight account
- Configure AWS Quicksight using the my-manifest.json file to visualise your model output on a map
IMPORTANT NOTE: Make sure to shut down your sagemaker notebook instance after you are done. Otherwise, you will get charged for it per hour. It will not shut down automatically. For training, you will provision a new instance, however that second instance will stop running after the training job is finished.
- Python: Python is a programming language that lets you work more quickly and integrate your systems more effectively.
- AWS: Amazon Web Services is a subsidiary of Amazon that provides on-demand cloud computing platforms to individuals, companies, and governments, on a metered pay-as-you-go basis.
The National UFO Reporting Center (NUFORC) is an organization in the United States that investigates UFO sightings and/or alien contacts.
The dataset for this workshop can be obtained via Kaggle. However, a static copy is also provided in the exercises/datasets
folder. You can download the ufo sightings dataset from Kaggle
- A Cloud Guru - AWS Certified Machine Learning - Speciality 2019: Enjoyed this workshop? The content was inspired from A Cloud Guru's AWS Certified Machine Learning - Specialty 2019.
- Towards Data Science: How Does k-Means Clustering in Machine Learning Work?
- Towards Data Science: Clustering using K-means algorithm
- The Amazon SageMaker K-means algorithm Documentation: K-Means Algorithm
- AWS Guide to Using Amazon SageMaker Built-in Algorithms: K-Means Algorithm
- BML Slack Channel - Join our slack workspace to collaborate with others, discuss ideas and post any questions you have about our group or the workshops
- Have questions about workshop exercises or setting up your AWS account and configurations? Post them here
- How was this workshop? Please provide us with some feedback here so that we can improve the content and delivery of future workshops.