A comma seperated list of ~100 thousand German nouns and their grammatical properties (tense, number, gender) as CSV file. Plus a module to look up the data and parse compound words. Compiled from the WiktionaryDE.
The list can be found here: german_nouns/nouns.csv
If you want to look up nouns or parse compound words, install this package (for Python 3.8+) and follow the instructions below:
pip install german-nouns
from pprint import pprint
from german_nouns.lookup import Nouns
nouns = Nouns()
# Lookup a word
word = nouns['Fahrrad']
pprint(word)
# Output:
[{'flexion': {'akkusativ plural': 'Fahrräder',
'akkusativ singular': 'Fahrrad',
'dativ plural': 'Fahrrädern',
'dativ singular': 'Fahrrad',
'dativ singular*': 'Fahrrade',
'genitiv plural': 'Fahrräder',
'genitiv singular': 'Fahrrades',
'genitiv singular*': 'Fahrrads',
'nominativ plural': 'Fahrräder',
'nominativ singular': 'Fahrrad'},
'genus': 'n',
'lemma': 'Fahrrad',
'pos': ['Substantiv']}]
# parse compound word
words = nouns.parse_compound('Vermögensbildung')
print(words)
# Output:
['Vermögen', 'Bildung'] # Now lookup nouns['Vermögen'] etc.
To compile the list yourself, you need Python 3.8+ and Poetry installed.
1. Clone the repository and install dependencies with Poetry:
$ git clone https://github.com/gambolputty/german-nouns
$ cd german-nouns
$ poetry install
Find the latest XML-dump files here: https://dumps.wikimedia.org/dewiktionary/latest, for example this one and download it. Then execute:
$ poetry run python -m german_nouns.parse_dump /path-to-xml-dump-file.xml.bz2
The CSV file will be saved here: german_nouns/nouns.csv.
Remove german_nouns/index.txt
to let the script recreate the word-index when using the lookup methods.