-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to grab sections of a Wikipedia article #15
Comments
Actually, that functionality is available in the API, through the rvsection parameter of the Query - Revisions API call: http://www.mediawiki.org/wiki/API:Query_-_Properties#revisions_.2F_rv So you're more than welcome to extend the get method or write a new get_section method to handle this. |
Hmm, thanks for the response! I shifted to working on a different part of my project, so I'll come back to this when I need to do some Wikipedia scraping again. I didn't get a chance to look into that API call in depth. |
…ptions. Should solve jpatokal#15 and jpatokal#29, amongst others.
This should be possible now by specifying the MediaWiki::Gateway.new('https://en.wikipedia.org/w/api.php').get('Banana', 'rvsection' => 0) |
Closing for now, although pull requests to package this up more cleanly are welcome. |
Hi,
For my own project, I'm currently writing a ruby script built on top of this gem and Nokogiri that can easily extract content from just a section of a Wikipedia article. For example, for the article on bananas (http://en.wikipedia.org/wiki/Banana), I might only want to grab the section on Taxonomy and nothing else. My script would make that really easy by specifying the page title and the section number.
I'm interested in contributing this feature to this project, but I'm wondering whether it's appropriate. The functionality isn't supported by the API, and I'm getting it to work by parsing through the HTML, so Wikimedia provides no guarantees that this will always work. This feature also wouldn't work on all Wikimedia projects, since not all of them have a Table of Contents and are broken down into sections. For example, this works on Wikipedia and Wiktionary, but would not work for Wikisource.
What do you think?
Best,
Brandon
The text was updated successfully, but these errors were encountered: