Compare predictive modeling techniques for US bird species distribution using data from eBird, US Boundary data, and several raster datasets (e.g., Digital Elevation Models (DEM), Urban Imperviousness, Land Cover, Canopy, Weather, Hydrography, and Soil). The project will involve collecting and preprocessing this data, then implementing the models and evaluating model performance, using performance metrics such as partial ROC curves, F1 scores, precision, recall, and accuracy.
- Implement and compare the performance of species distribution models such as Logistic Regression, LightGBM, XGBoost, K-Nearest Neighbors (KNN), and Random Forest against the performance of MaxEnt and Poisson Point Process modeling.
- Examine data processing mechanisms, address absence data and autocorrelation, and explore sampling methods to improve model performance and reduce bias.
- Evaluate model performance across several bird species and study areas in the US.