Skip to content

Latest commit

 

History

History
50 lines (29 loc) · 1.5 KB

readme.md

File metadata and controls

50 lines (29 loc) · 1.5 KB

This is a fork to make a permanent backup of the SCP wiki.

This is a Python command line client for relatively popular wiki hosting http://www.wikidot.com which lets you:

  • List all pages on a site
  • See all revisions of a page
  • Query page source

Most interestingly, it allows you to download the whole site as a Git repository, with proper commit dates, author and comments!

Dependencies

At least:

  • Python 3
  • python-beautifulsoup4
  • python-gitpython
  • python-requests
  • python-tqdm
Examples:
crawl.py http://example.wikidot.com --dump ExampleRepo
crawl.py http://example.wikidot.com --log --page example-page

It uses internal Wikidot AJAX requests to do it's job. If you're from Wikidot, please don't break it. Thank you! We'll try to be nice and not put a load on your servers.

Downloading of large sites might take a while. If anything breaks, just restart the same command, it'll continue from where it crashed.

Useful links:

Wikidot code (very old) which simplifies things a bit:

The descriptions for on-site modules are heavily correlated with AJAX ones:

Someone else did Wikidot AJAX:

TODO

  • Handle deleted images. Probably need to check the diff and check all pages for references if removed from one page.
  • Handle tags (both added and removed).