ParlamentParla

Speech corpus for Catalan. The version 2.0 corpus can be downloaded in parts:

Corpus information

This is the ParlamentParla speech corpus for Catalan prepared by Col·lectivaT SCCL. The audio segments were extracted from recordings the Catalan Parliament (Parlament de Catalunya) plenary sessions, which took place between 2007/07/11 - 2018/07/17. We aligned the transcriptions with the recordings and extracted the corpus. The content belongs to the Catalan Parliament and the data is released conforming their terms of use.

Preparation of this corpus was partly supported by the Department of Culture of the Catalan autonomous government, and the v2.0 was supported by the Barcelona Supercomputing Center, within the framework of the project AINA of the Departament de Polítiques Digitals.

As of v2.0 the corpus is separated into 211 hours of clean and 400 hours of other quality segments. The stat details are as follows:

Subcorpus	Gender	Duration (h)
other_test	F	2.516
other_dev	F	2.701
other_train	F	109.68
other_test	M	2.631
other_dev	M	2.513
other_train	M	280.196
other total		400.239
clean_test	F	2.707
clean_dev	F	2.576
clean_train	F	77.905
clean_test	M	2.516
clean_dev	M	2.614
clean_train	M	123.162
clean total		211.48
Total		611.719

For more information go to https://collectivat.cat/asr

Revision log

2.0: Major changes in the file structure; speaker ids with respective genders added. The speakers of train, test and dev corpora do not overlap. A major increase in size with a total time of 611 hours 43 minutes.
1.0: Much better quality due to improved segmentation, corpus separated into clean and other.
0.2: First public release of approx. 320 hours.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LICENSE.txt		LICENSE.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ParlamentParla

Corpus information

Revision log

About

Releases 1

Packages

License

CollectivaT-dev/ParlamentParla

Folders and files

Latest commit

History

Repository files navigation

ParlamentParla

Corpus information

Revision log

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Packages