Skip to content

EddyLuten/domain-scrape

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Scrapes the pages and resources on a domain, starting from the provided URL.
Local directory structure will mimic the URL paths as closely as possible.
Inspects the HTML pages for src and href attributes.

Usage: usage = scrape.py OPTIONS domain url

Options:
  -h, --help  show the help message and exit
  --out  output directory, if not provided, will use working directory

Examples:

Scrape the google.com domain, starting at http://google.com/:
  python ./scrape.py google.com http://google.com/  

Scrape the github.com domain, store in the provided directory:
  python ./scrape.py --out ./github github.com http://github.com/

About

Python web scraper

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages