Skip to content

geospatial-jeff/dg-open-data-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Digital Globe Open Data Scraper

Simple GDAL + Scrapy utility for pulling metadata from the Digital Globe Open Data program. The goal is to generate an index for querying the Open Data program.

Usage

The library has a simple CLI which scrapes the Open Data program website to generate a list of available file paths. The gdal.Info utility is used to read metadata about each file which is stored in a list, pickled, and written to the specified text file.

dg-open-data build --output data.txt

You can also translate the output text file to other data formats such as an Rtree:

dg-open-data translate data.txt --output rtree_index --format rtree

Disclaimer: There are a lot of images (~24,000) in the entire dataset so the first command can take a long time. It took ~20 minutes to process the entire dataset with 100 threads on a t2.2xlarge EC2 instance.

About

Scraping metadata from the Digital Globe Open Data portal.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages