This project is used for class Frontier Technology with Data Science topic in Universitas Pelita Harapan.
This repository contains the prediction result of Tuna fish location in Indonesian waters. The prediction is gained by using multiple data, provided in Global Fishing Watch. This repository also contains the data of ships that catches tuna from 2012 - 2016.
- To use this repository correctly, you'll need:
- R (In this project, we use version 3.5.1)
- RStudio
- Internet Browser
- Java
- Run R and install the following package needed:
shiny
leaflet
ggmap
ncdf4
naivebayes
ggplot2
To install packages, you can use the code:
install.packages("insert package name here")
For example, we need to install shiny
. So enter the code
install.packages("shiny")
- We will also need several data from several sources
- Global Fishing Watch in csv format
- NOAA High Resolution SST in netCDF format
- OceanWatch - Chlorophyll a Concentration in netCDF format
To run this R project, you can follow this steps: (use RStudio)
- Clone this repository to your directory
git clone "https://github.com/stevenalbert/tuna-prediction"
-
Open
tuna-prediction.Rproj
with RStudio -
Open
server.R
orui.R
of this project in RStudio and clickRun App
or you can just use the commandrunApp()
. -
Enjoy the application.
Using Global Fishing Watch, we can extract the data of all the ships fishing. While there is a lot of data in this project, the data we currently needed are the ships fishing around Indonesia waters. To get the exact location, we filtered the ship needed in the following coordinates:
- Latitude of -14° ↔ 8°
- Longitude of 85° ↔ 142°
We planned to filter ship data geartypes that is not for fishing tuna. Unfortunately, all the ships in our coordinates range is equipped with geartypes to fish tuna. So we just assume all ships that fish is probably fishing tuna.
Because the data gained from Global Fishing Watch isn't enough to predict Tuna Fish Location, we extract another data from NOAA High Resolution SST, which provides data for daily sea surface temperature (SST) and OceanWatch, which provides weekly chlorophyll data. After getting the data, we can map the temperature data and the chlorophyll data with the data of ships Global Fishing Watch. If there is no data available for the specified date, we will fill it with NA (Not Available).
Because our ship data is daily, while the chlorophyll data time range is unevenly spread with mostly a range of 7 days 23h 4m 19s, we map the ships data to the closest date of the chlorophyll data. For the SST data, there is no problem with time range because the time is already daily.
The ships, SST, and chlorophyll data all has different range value for latitude and longitude degree. We have to change the data to a range of 1° for the SST and chlorophyll data. Then, we map the ship data by rounding the coordinates to the nearest value of the SST and the chlorophyll data.
To classify the data, we assume if ships that has fishing hours value above zero indicates that they are fishing tuna. So the value of tuna in each coordinates will be either 0 or 1.
Data prediction is taken from sea surface temperature and chlorophyll-a, from 2012-01-01 to 2018-03-31. Data prediction is created per days in 1° x 1° tiles combined with sea surface temperature and chlorophyll-a. It can contain NA values in each row, but we remove all rows that has NA value. It is saved in prediction_data directory.
After getting the exact data we need, we will predict the locations with Naive Bayes classifier.
The probability density function for the normal distribution is defined by two parameters (mean and standard deviation).
SST | 0 (No Tuna) | 1 (Tuna) |
---|---|---|
Mean | 29.098720 | 28.240935 |
SD | 1.188109 | 1.102100 |
Chlorophyll | 0 (No Tuna) | 1 (Tuna) |
---|---|---|
Mean | 0.7860678 | 0.3662977 |
SD | 1.1794882 | 0.8036224 |
Actual: NO | Actual: YES | |
---|---|---|
Predicted: NO | 194743 | 70047 |
Predicted: YES | 119649 | 194014 |
From confusion matrix, we can get the accuracy of bayes model which is sum of the true prediction.
To visualize the data, we use shiny
to show the location of the ships and result of the tuna prediction.
In this application, user can use the slidebar to change designated date in which the information shown will change according to the date set.
There are 4 informations that are shown in this application. In the home tab, the right side shows the density of the prediction, while the left side shows the grouping of the density. User can scroll the mouse at the left side to show a more detail grouping in the map. When scrolling down the mouse, the map will show a more accurate position and grouping on the map.
On the Details tab, user can check the Naive Bayes model graph, which shows the sst density distribution and chlorophyll density distribution from the training data. The red line draws the distribution of the place with no tuna and the striped green line draws the distribution of the place with tuna.
Some files included in this repository:
fishing_effort/
andtrain_data.csv
: Filtered data used for trainingprediction_data/
: Data for predictionchlorophyll/
: Scaled data from 0.05° x 0.05° to 1° x 1° of OceanWatch Chlorophyll-a data (from 2012 to 2017 and a bit of 2018)data.R
: Function used for filtering fishing data except filtering latitude and longitude
Files excluded in this repository:
- Sea surface temperature data from NOAA
- Unscaled Chlorophyll-a data
All data used above are owned by its designated owner.
This project is made and developed only for educational purpose.