This project applies a Random Forest Regressor to predict average vehicle speed in Berlin based on traffic density data. By analyzing relationships between vehicle counts (total, cars, trucks) and their speeds, the project provides insights into traffic behavior and performance evaluation of the model.
- Project Description
- Dataset
- Installation
- Usage
- Results
- Visualizations
- Technologies Used
- License
Accurate prediction of average vehicle speed helps in understanding traffic patterns and improving transportation systems. This project:
- Trains a Random Forest Regressor using traffic density data.
- Evaluates the model with metrics like Mean Squared Error (MSE) and R-squared (R²).
- Visualizes model results and feature importance for interpretability.
The dataset contains hourly traffic data from Berlin:
- Features:
-vehicle_count_per_hour: Total vehicles per hour.
- car_count_per_hour: Total cars per hour.
- truck_count_per_hour: Total trucks per hour.
- Target:
- avg_speed_all_vehicles_kmh: Average speed of all vehicles (km/h).
- The dataset is stored in a CSV file and uses a semicolon (;) as the delimiter.
- Clone the repository: git clone https://github.com/busrayatlav/Berlin-Traffic-Random-Forest.git cd Berlin-Traffic-Random-Forest
- Set up a Python virtual environment (optional but recommended): python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies: pip install -r requirements.txt
-
Load the dataset:
Ensure the dataset (berlin_traffic_data.csv
) is in the same directory as the script./path/to/berlin_traffic_data.csv
-
Run the script: Execute the script to train the model and generate outputs.
python berlin_traffic_random_forest.py
-
Outputs:
- Model performance metrics (MSE, R²) will be displayed in the terminal.
- Visualizations will either be saved or displayed directly.
• Mean Squared Error (MSE): 188.14 • R-squared (R²): 0.27 • The model shows moderate predictive accuracy but highlights key features influencing average speed.
- Actual vs Predicted Speeds: A scatter plot comparing actual traffic speeds to model predictions.
- Feature Importance: A bar chart showing the relative importance of input features.
- Residual Plot: A scatter plot of residuals to evaluate prediction errors.
- Python: Programming language.
- Pandas: Data manipulation.
- scikit-learn: Machine learning library.
- Matplotlib: Data visualization.
This project is licensed under the MIT License.