A Python project with WikiBot implementations to fix consistency issues on Wikidata.
This repo contains a few utility scripts that fix consistency issues and missing data on Wikidata, focusing on TV series.
It is used by my Wikidata bot a.k.a. TheFireBenderBot. Check out its contributions to get an idea of what it specializes at. Here are some stats.
constraint.py
contains the abstract definition for the concept of a Constraint. This is similar to how Wikidata defines constraints, except that the implementation may contain a way to fix them.
general.py
and tv.py
contain a few concrete implementations for constraints.
bots
contains various Bot implementations that can be used to iterate through Wikidata pages using a generator, and treat (process) them.
television.py
contains abstract models for the concepts of Episode, Season, Series and more. Each model has some semantic knowledge of the item it encapsulates, as well as the constraints it should be checked for.
wikidata_properties.py
has a bunch of constants that encode property codes and a few common ID values. A list of all properties can be found here
In order to run the scripts, you need to create a Bot account on Wikidata. Bot names usually end with the suffix "Bot". Once you have the appropriate credentials, create the following files:
family = 'wikidata'
mylang = 'wikidata'
usernames['wikidata']['wikidata'] = u'YourBotName'
password_file = "user-password.py"
(u'YourBotName', BotPassword(u'YourBotName', u'YourBotPassword'))
OR
(u'YourUserName@YourBotName', u'YourBotPassword')
Also see the Wikidata page on Bots
Next, you need to install dependencies using the requirements.txt
file. This is best done using a virtualenv and pip3:
virtualenv pywiki
source pywiki/bin/activate
pip3 install -r requirements.txt
- Checking individual items for constraint failures:
# Q65604139 = Season 1 of "Dark" # Q65640227 Q65640226 Q65640224 = Episodes of "Dark" python3 check_constraints.py Q65640227 Q65640226 Q65640224 Q65604139
- Checking the episodes of a series (Jessica Jones) for constraint failures:
# Q18605540 = Jessica Jones python3 check_tv_show.py Q18605540 \ --child_type=episode
- Checking and fixing the seasons of a series for constraint failures
# Q18605540 = Jessica Jones python3 check_tv_show.py Q18605540 \ --child_type=season \ --autofix
- Checking and fixing the episodes of a series for constraint failures, but wait until all the failures have been reported before fixing all of them at the end.
# Q18605540 = Jessica Jones python3 check_tv_show.py Q18605540 \ --child_type=episode \ --autofix \ --accumulate
- Fixing only the titles of episodes of a series
An equivalent command is
# Q18605540 = Jessica Jones python3 check_tv_show.py Q18605540 \ --child_type=episode \ --autofix \ --accumulate \ --filter title
# Q18605540 = Jessica Jones python3 check_tv_show.py Q18605540 \ --child_type=episode \ --autofix \ --accumulate \ --filter P1476
-
Get the list of episodes for The Neighborhood:
# This will write out two files # the-neighborhood-tv-series_S01.csv and # the-neighborhood-tv-series_S02.csv python3 -m cli.list_episodes "https://en.wikipedia.org/wiki/The_Neighborhood_(TV_series)" --episode-counts=21,22
-
Create seasons in Wikidata
# Create two seasons for Q7753382 (The Neighborhood) python3 -m cli.create_seasons Q7753382 2
-
Create the episodes in Wikidata:
python3 -m cli.create_episodes Q7753382 Q99419240 the-neighborhood-tv-series_S01.csv --quickstatements
A few fixes are fairly straightforward, and should not require supervision. The canned
folder exposes these fixes in the form of scripts that can be run directly without any arguments. If you want to see what changes will be made, run the script with the --dry
flag.
Example:
# Dry run mode, won't update labels
python3 -m canned.fix_missing_labels --dry
# Run after confirming that the changes look correct
python3 -m canned.fix_missing_labels
Run pytest
at the root of the repository. You should see something similar to:
================================== test session starts ==================================
platform darwin -- Python 3.7.6, pytest-6.1.1, py-1.9.0, pluggy-0.13.1
rootdir: /foo/bar/baz/wikidata-toolkit
plugins: mock-3.3.1
collected 4 items
cli/test_cli.py .... [100%]
=================================== 4 passed in 3.40s ===================================
Hello there! If you are a Hacktoberfest 🎃 participant and wish to contribute to this repository, you can
- Pick an issue with the
hacktoberfest
label - Fork this repository
- Clone this repository to your local machine
- Create a new branch
- Work on the issue on this new branch
- Push your branch to your fork
- Send a PR!