Skip to content

In this project, I have to use one of Udacity's curated datasets and investigate it using NumPy and Pandas. I chose the TMDB dataset with over 10,000 observations and applied the entire data analysis process, started by posing a question and finishing by sharing my findings.

Notifications You must be signed in to change notification settings

zebzafr/Investigate-TMDB-Movie-Dataset

Repository files navigation

Investigate-TMDB-Movie-Dataset

Project Overview

In this project, I have to use one of Udacity's curated datasets and investigate it using NumPy and Pandas. I chose the TMDB dataset with over 10,000 observations and applied the entire data analysis process, started by posing a question and finishing by sharing my findings.

Project Motivation

In the supporting lesson content, I was introduced to the key steps in data analysis process -

  • Choosing a dataset
  • Asking questions
  • Data wrangling
  • Exploratory data analysis
  • Drawing conclusions

I had to apply the lessons learned to see how all the steps fit together to answer my questions. I used Python and some of its libraries to wrangle, explore, analyze and visualize data and this made the implementation of the data analysis process a lot easier.

Requirements

The project requires Python 3 plus the following python libraries:

  • Pandas
  • NumPy
  • Matplotlib

I used Jupyter Notebook to run and execute the code.

Learning Outcomes

After completing the project, I learned following:

  • The key steps in a typical data analysis process
  • Comfortable posing and answering questions with a given dataset
  • Know how to investigate problems in a dataset and wrangle the data into a format that can be used
  • Practice communicating the results of the analysis
  • Be able to use vectorized operations in NumPy and Pandas to speed up the data analysis code
  • Be familiar with pandas' Series and DataFrame objects, which let's accessing data more conveniently
  • Know how to use Matplotlib to produce plots showing the findings

About

In this project, I have to use one of Udacity's curated datasets and investigate it using NumPy and Pandas. I chose the TMDB dataset with over 10,000 observations and applied the entire data analysis process, started by posing a question and finishing by sharing my findings.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published