german2vec

Overview

This repository contains documentation and code for building a German Language Model using the fastai library and applying it on a variety of NLP tasks such as text classification. The language model is based on 3-layer AWD-LSTM that was first published by Salesforce Research.

The backbone of the model is trained on the German Wikipedia Corpus and uses transfer learning to apply it to on text classification tasks (as described in Universal Language Model Fine-tuning for Text Classification).

Update:

A pre-trained Language Model using the German Wikipedia Corpus is available from this website: https://lernapparat.de/german-lm/. Thanks for sharing, Thomas!

Project structure

data/ -- language model for German language (available from https://lernapparat.de/german-lm/)
doc/ -- documentation and implementation notes
sb-10k_german_sentiment_classification/ -- raw data for SB-10k Corpus
scr/ -- notebooks used for various experiments on NLP classification

Notebook	Task
sb-10k-use_pretrained_language_model.ipynb	classifier for SB-10k Corpus (built on pre-trained language model)
sb-10k_small_wikipedia_corpus.ipynb	classifier for SB-10k Corpus (built on self-trained language model using German Wikipedia)
sb-10k-data_preprocessing.ipynb	data pre-processing steps for SB-10k: German Sentiment Corpus

TODO

fine-tune and evaluate classifier using SB-10k: German Sentiment Corpus

Future research

to be updated

Contact

For more information, please feel free to contact me via e-mail ([email protected])

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
data		data
doc		doc
sb-10k_german_sentiment_classification		sb-10k_german_sentiment_classification
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
prepare_wikipedia.sh		prepare_wikipedia.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

german2vec

Overview

Project structure

TODO

Future research

Contact

About

Releases

Packages

Languages

Bachfischer/german2vec

Folders and files

Latest commit

History

Repository files navigation

german2vec

Overview

Project structure

TODO

Future research

Contact

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages