Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to grab sections of a Wikipedia article #15

Closed
branliu0 opened this issue Jun 21, 2011 · 4 comments
Closed

Ability to grab sections of a Wikipedia article #15

branliu0 opened this issue Jun 21, 2011 · 4 comments

Comments

@branliu0
Copy link
Contributor

Hi,

For my own project, I'm currently writing a ruby script built on top of this gem and Nokogiri that can easily extract content from just a section of a Wikipedia article. For example, for the article on bananas (http://en.wikipedia.org/wiki/Banana), I might only want to grab the section on Taxonomy and nothing else. My script would make that really easy by specifying the page title and the section number.

I'm interested in contributing this feature to this project, but I'm wondering whether it's appropriate. The functionality isn't supported by the API, and I'm getting it to work by parsing through the HTML, so Wikimedia provides no guarantees that this will always work. This feature also wouldn't work on all Wikimedia projects, since not all of them have a Table of Contents and are broken down into sections. For example, this works on Wikipedia and Wiktionary, but would not work for Wikisource.

What do you think?

Best,
Brandon

@jpatokal
Copy link
Owner

Actually, that functionality is available in the API, through the rvsection parameter of the Query - Revisions API call:

http://www.mediawiki.org/wiki/API:Query_-_Properties#revisions_.2F_rv

So you're more than welcome to extend the get method or write a new get_section method to handle this.

@branliu0
Copy link
Contributor Author

Hmm, thanks for the response! I shifted to working on a different part of my project, so I'll come back to this when I need to do some Wikipedia scraping again. I didn't get a chance to look into that API call in depth.

blackwinter added a commit to blackwinter/mediawiki-gateway that referenced this issue Jul 23, 2014
@blackwinter
Copy link
Collaborator

This should be possible now by specifying the rvsection to retrieve (see #61):

MediaWiki::Gateway.new('https://en.wikipedia.org/w/api.php').get('Banana', 'rvsection' => 0)

@jpatokal
Copy link
Owner

Closing for now, although pull requests to package this up more cleanly are welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants