Predicting Tuna Fish Location in Indonesia Sea

This project is used for class Frontier Technology with Data Science topic in Universitas Pelita Harapan.

This repository contains the prediction result of Tuna fish location in Indonesian waters. The prediction is gained by using multiple data, provided in Global Fishing Watch. This repository also contains the data of ships that catches tuna from 2012 - 2016.

Getting started

To use this repository correctly, you'll need:
- R (In this project, we use version 3.5.1)
- RStudio
- Internet Browser
- Java
Run R and install the following package needed:
- shiny
- leaflet
- ggmap
- ncdf4
- naivebayes
- ggplot2

To install packages, you can use the code:

install.packages("insert package name here")

For example, we need to install shiny. So enter the code

install.packages("shiny")

We will also need several data from several sources
- Global Fishing Watch in csv format
- NOAA High Resolution SST in netCDF format
- OceanWatch - Chlorophyll a Concentration in netCDF format

Usage

To run this R project, you can follow this steps: (use RStudio)

Clone this repository to your directory

git clone "https://github.com/stevenalbert/tuna-prediction"

Open tuna-prediction.Rproj with RStudio
Open server.R or ui.R of this project in RStudio and click Run App or you can just use the command runApp().
Enjoy the application.

Implementation

Filtering and Data Mapping

Using Global Fishing Watch, we can extract the data of all the ships fishing. While there is a lot of data in this project, the data we currently needed are the ships fishing around Indonesia waters. To get the exact location, we filtered the ship needed in the following coordinates:

Latitude of -14° ↔ 8°
Longitude of 85° ↔ 142°

We planned to filter ship data geartypes that is not for fishing tuna. Unfortunately, all the ships in our coordinates range is equipped with geartypes to fish tuna. So we just assume all ships that fish is probably fishing tuna.

Because the data gained from Global Fishing Watch isn't enough to predict Tuna Fish Location, we extract another data from NOAA High Resolution SST, which provides data for daily sea surface temperature (SST) and OceanWatch, which provides weekly chlorophyll data. After getting the data, we can map the temperature data and the chlorophyll data with the data of ships Global Fishing Watch. If there is no data available for the specified date, we will fill it with NA (Not Available).

Because our ship data is daily, while the chlorophyll data time range is unevenly spread with mostly a range of 7 days 23h 4m 19s, we map the ships data to the closest date of the chlorophyll data. For the SST data, there is no problem with time range because the time is already daily.

The ships, SST, and chlorophyll data all has different range value for latitude and longitude degree. We have to change the data to a range of 1° for the SST and chlorophyll data. Then, we map the ship data by rounding the coordinates to the nearest value of the SST and the chlorophyll data.

To classify the data, we assume if ships that has fishing hours value above zero indicates that they are fishing tuna. So the value of tuna in each coordinates will be either 0 or 1.

Data Prediction

Data prediction is taken from sea surface temperature and chlorophyll-a, from 2012-01-01 to 2018-03-31. Data prediction is created per days in 1° x 1° tiles combined with sea surface temperature and chlorophyll-a. It can contain NA values in each row, but we remove all rows that has NA value. It is saved in prediction_data directory.

Prediction with Naive Bayes

After getting the exact data we need, we will predict the locations with Naive Bayes classifier.

Bayes Theorem

Our bayes formula

The probability density function for the normal distribution is defined by two parameters (mean and standard deviation).

Naive Bayes Model from Training Data

Sea Surface Temperature

SST	0 (No Tuna)	1 (Tuna)
Mean	29.098720	28.240935
SD	1.188109	1.102100

Chlorophyll

Chlorophyll	0 (No Tuna)	1 (Tuna)
Mean	0.7860678	0.3662977
SD	1.1794882	0.8036224

Confusion Matrix

	Actual: NO	Actual: YES
Predicted: NO	194743	70047
Predicted: YES	119649	194014

From confusion matrix, we can get the accuracy of bayes model which is sum of the true prediction.

To calculate tuna probability we use the Normal Distribution formula

Data Visualization

To visualize the data, we use shiny to show the location of the ships and result of the tuna prediction.

In this application, user can use the slidebar to change designated date in which the information shown will change according to the date set.

There are 4 informations that are shown in this application. In the home tab, the right side shows the density of the prediction, while the left side shows the grouping of the density. User can scroll the mouse at the left side to show a more detail grouping in the map. When scrolling down the mouse, the map will show a more accurate position and grouping on the map.

On the Details tab, user can check the Naive Bayes model graph, which shows the sst density distribution and chlorophyll density distribution from the training data. The red line draws the distribution of the place with no tuna and the striped green line draws the distribution of the place with tuna.

Notes

Some files included in this repository:

fishing_effort/ and train_data.csv: Filtered data used for training
prediction_data/: Data for prediction
chlorophyll/: Scaled data from 0.05° x 0.05° to 1° x 1° of OceanWatch Chlorophyll-a data (from 2012 to 2017 and a bit of 2018)
data.R: Function used for filtering fishing data except filtering latitude and longitude

Files excluded in this repository:

Sea surface temperature data from NOAA
Unscaled Chlorophyll-a data

Developed by

License

All data used above are owned by its designated owner.

This project is made and developed only for educational purpose.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting Tuna Fish Location in Indonesia Sea

Getting started

Usage

Implementation

Filtering and Data Mapping

Data Prediction

Prediction with Naive Bayes

Bayes Theorem

Our bayes formula

Naive Bayes Model from Training Data

Sea Surface Temperature

Chlorophyll

Confusion Matrix

To calculate tuna probability we use the Normal Distribution formula

Data Visualization

Notes

Developed by

License

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
Application		Application
chlorophyll		chlorophyll
fishing_effort		fishing_effort
prediction_data		prediction_data
.gitignore		.gitignore
README.md		README.md
data.R		data.R
prediction.R		prediction.R
server.R		server.R
train_data.csv		train_data.csv
tuna-prediction.Rproj		tuna-prediction.Rproj
ui.R		ui.R

stevenalbert/tuna-prediction

Folders and files

Latest commit

History

Repository files navigation

Predicting Tuna Fish Location in Indonesia Sea

Getting started

Usage

Implementation

Filtering and Data Mapping

Data Prediction

Prediction with Naive Bayes

Bayes Theorem

Our bayes formula

Naive Bayes Model from Training Data

Sea Surface Temperature

Chlorophyll

Confusion Matrix

To calculate tuna probability we use the Normal Distribution formula

Data Visualization

Notes

Developed by

License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages