Skip to content

Latest commit

 

History

History
31 lines (25 loc) · 1.57 KB

README.md

File metadata and controls

31 lines (25 loc) · 1.57 KB

Summary

The UD Albanian Treebank is a small treebank for Standard Albanian, developed within a project framework at Uppsala University. The data was extracted from Wikipedia.

Introduction

The UD Treebank for Standard Albanian (TSA) is a small treebank that consists of 60 sentences corresponding to 922 tokens. The data was collected from different Wikipedia entries. This treebank was created mainly manually following the Universal Dependencies guidelines. The lemmatization was performed using the lemmatizer https://bitbucket.org/timarkh/uniparser-albanian-grammar/src/master/ developed by the Albanian National Corpus team (Maria Morozova, Alexander Rusakov, Timofey Arkhangelskiy). Tagging and Morphological Analysis were semi-automated through python scripts and corrected manually, whereas Dependency relations were assigned fully manually. We encourage any initiatives to increase the size and/or improve the overall quality of the Treebank.

Acknowledgments

This treebank was created by Marsida Toska at Uppsala University under the supervision of Joakim Nivre.

Changelog

  • 2020-05-15 v2.6
    • Initial release in Universal Dependencies.
=== Machine-readable metadata (DO NOT REMOVE!) ================================
Data available since: UD v2.6
License: CC BY-SA 4.0
Includes text: yes
Genre: wiki
Lemmas: manual native
UPOS: manual native
XPOS: not available
Features: manual native
Relations: manual native
Contributors: Toska, Marsida
Contributing: here
Contact: [email protected]
===============================================================================