The UD Albanian Treebank is a small treebank for Standard Albanian, developed within a project framework at Uppsala University. The data was extracted from Wikipedia.
The UD Treebank for Standard Albanian (TSA) is a small treebank that consists of 60 sentences corresponding to 922 tokens. The data was collected from different Wikipedia entries. This treebank was created mainly manually following the Universal Dependencies guidelines. The lemmatization was performed using the lemmatizer https://bitbucket.org/timarkh/uniparser-albanian-grammar/src/master/ developed by the Albanian National Corpus team (Maria Morozova, Alexander Rusakov, Timofey Arkhangelskiy). Tagging and Morphological Analysis were semi-automated through python scripts and corrected manually, whereas Dependency relations were assigned fully manually. We encourage any initiatives to increase the size and/or improve the overall quality of the Treebank.
This treebank was created by Marsida Toska at Uppsala University under the supervision of Joakim Nivre.
- 2020-05-15 v2.6
- Initial release in Universal Dependencies.
=== Machine-readable metadata (DO NOT REMOVE!) ================================ Data available since: UD v2.6 License: CC BY-SA 4.0 Includes text: yes Genre: wiki Lemmas: manual native UPOS: manual native XPOS: not available Features: manual native Relations: manual native Contributors: Toska, Marsida Contributing: here Contact: [email protected] ===============================================================================