Forecast cards are a simple data specification for storing key information about your travel forecast in order to:
- evaluate performance of a forecast over time,
- analyze the collective performance of forecasting systems and institutions over time, and
- identify contributing factors to high performing forecasts.
There five are types of Forecast Cards:
- Points of Interest, such as a roadway segment or transit line,
- Projects, such as a roadway expansion, an HOV designation,
- Scenarios or runs, including information about the forecasting system
- Forecasts, which are predictions at the points of interest about what the project will do,
- Observations, which are points of data used to evaluate the the forecasts
Each "card" is a text-based CSV file.
The forecastcards Python library is designed to validate and organize data that conforms to the forecast cards data schema and consists of four main classes:
- Cardset: a set of forecast data projects that conforms to the forecastcards data schema
- Dataset: turns a cardset into a pandas dataset suitable for estimation purposes
- Project: to validate single projects (much of the same functionality as Cardset)
- Schema: to manage and validate the data schemas
Validate Single Project
validate_project.py "forecastcards/examples/ecdot-lu123-munchkin_tod"
Project
import forecastcards
# project locations can either be
# - a dictionary describing a github location,
# - a local directory, or
# - a github web address.
gh_project = {'username':'e-lo',
'repository':'forecastcards',
'branch':'master',
'subdir':'examples/ecdot-rx123-ybr_hov'}
# load project and validate using default data schema
project = forecastcards.Project(project_location = gh_project)
# check if project is valid
project.valid
Cardset
import forecastcards
# project locations can either be
# - a dictionary describing a github location or
# - a local directory
gh_data = {'username':'e-lo',
'repository':'forecastcards',
'branch':'master',
'subdir':'examples'}
# cardset walks through a directory, finds projects, and validates them
# according to the right schema.
# projects can be excluded or explicitly selected using keyword options
cardset = forecastcards.Cardset(data_loc = gh_data, exclude_projects=['lu123'])
cardset.add_projects(data_loc=ex_data, select_projects=['lu123'])
Dataset
Create a dataset suitable for estimating quantile regressions.
import forecastcards
# project locations can either be
# - a dictionary describing a github location or
# - a local directory
gh_data = {'username':'e-lo',
'repository':'forecastcards',
'branch':'master',
'subdir':'examples'}
cardset = forecastcards.Cardset(data_loc = gh_data)
dataset = forecastcards.Dataset(card_locs_by_type = cardset.card_locs_by_type,
file_to_project_id = cardset.file_to_project_id )
# access to the dataframe
dataset.df
Forecast Cards are compatible with the Open Knowledge Foundation's Frictionless Data Table Schema specification.
Explore the data schema from your web browser using colaboratory:
This project currently includes one example, which is the Emerald City DOT's HOV expansion for the Yellow Brick Road, which is contained in forecastcards/examples/emeraldcitydot-rx123-yellowbrickroadhov
This example can be analyzed and run with the notebooks
folder of this directory and can be run using binder or colaboratory.
In order to leverage a common set of tools, we suggest that forecast card data is stored in the following naming and folder structure:
agency-name-project-id-project-short-name/
|---README.md
|---
|---project-<project-id>-<project-short-name>.csv
|---scenarios-<project-id>.csv
|---poi-<project-id>.csv
|---observations/
| |---observations-<date>.csv
|
|---forecasts/
| |---forecast-<scenario-id>-<scenario-year>-<forecast-creation>-<forecast-id>.csv
-
Make sure you have the required data by examining the schema.
-
Create or Format Data as Forecast Cards
-
Use csv templates and enter data using a text browser or a spreadsheet application
-
Convert existing data using the helper scripts on the way*
- Use template notebooks locally or on a hosted remote server (i.e. colaboratory) to clean data and estimate quantile regressions.
Note: If you don't want to install forecastcards locally, you can run the code notebooks in the cloud using Google Colab.
Forecast cards requires Python 3.6 or higher and it is recommended that you install it in a virtual environment (i.e. Conda).
You can install forecastcards from this github repository using pip:
pip install --upgrade git+https://github.com/e-lo/forecastcards.git@master#egg=forecastcards
If you plan to make changes, you can clone this git repository install
from your local, cloned directory using pip:
pip install --upgrade .
For people using a newer version of MacOS, they may have trouble installing one of the dependencies because its setuplpy settings are not up to date. You can successfully install it by overriding the default compiler using:
CFLAGS='-stdlib=libc++' pip install cchardet
- decide where your data will live: local file server or github repository
- catalog and convert historic data
Use the Create_Forecast_Cards notebook locally, or
You can also just use the templates:
- Copy the folder from
\template
folder in theforecastcards
package to your folder for holding all the project forecastcards. - Rename project folder according to schema, taking care to not duplicate any - roject IDs within your analysis scope (usually your agency of the forecastcarddata store).
- Add observations, POIs, forecast runs, and forecasts for specific POIs as they are determined or created.
- Confirm data in new project conforms to data schema by running
validate_project.py <project_directory>
or for all the projects in a directory by runningvalidate_cardset.py
from that directory orvalidate_project.py <cardset_directory>
Use the Create_Forecast_Cards notebook locally, or
You can also just use the templates from \template
folder:
- add a new forecast csv file with relevant data for points of interest
- add an entry to scenario csv file about the model run
- Add any additional points of interest to poi csv file
- Confirm new data in project conforms to data schema by running
validate_project.py <project_directory>
or for all the projects in a directory by runningvalidate_cardset.py
from that directory orvalidate_project.py <cardset_directory>
Use the Create_Forecast_Cards notebook locally, or
You can also just use the templates from \template
folder:
- Add a new observations csv
- Confirm new data in project conforms to data schema by running
validate_project.py <project_directory>
or for all the projects in a directory by runningvalidate_cardset.py
from that directory orvalidate_project.py <cardset_directory>
- Select cards to use
- Clean and merge cards
- Create any additional categorical variables
- Perform regressions
There are three likely options for making your data available:
- Github (not great for extremely large datasets)
- Amazon S3 / Microsoft Azure / Google Cloud (functionality coming soon)
- Other agency-hosted web services (i.e. Socrata, webserver, etc.)
You can submit forecast cards to the community data store by:
- submitting a pull-request to the forecastcardsdata repository
- submitting an issue with a link to the location of the data along with permission to host it on the repository.
- set up the public data store as a mirror.
Please submit an issue!