Word Embedding - Hebrew

The Code for https://www.oreilly.com/learning/capturing-semantic-meanings-using-deep-learning

Note: This code works for Hebrew, but it should work on any other language.

Download hebrew dataset from wikipedia
- Go to: https://dumps.wikimedia.org/hewiki/latest/
- Download hewiki-latest-pages-articles.xml.bz2
In linux this can be easily done using:

wget https://dumps.wikimedia.org/hewiki/latest/hewiki-latest-pages-articles.xml.bz2
pip install --upgrade gensim (https://radimrehurek.com/gensim/install.html)
Run create_corpus.py: python create_corpus.py
- It will create wiki.he.text
train the model: from python prompt:
- import word2vec
- word2vec.train()
explore model using jupyter notebook. You can use the supplied playingWithHebModel.ipynb example as a starting point.

Word2Vec

Train (inp = "wiki.he.text", out_model = "wiki.he.word2vec.model")

FastText

pip install fasttext

Train (inp = "wiki.he.text", out_model = "wiki.he.fasttext.model", alg = "skipgram")

Test

Testing specific Hebrew analogies like:

פריז + גרמניה - צרפת = ברלין

גבר + מלכה - מלך = אישה

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.idea		.idea
README.md		README.md
create_corpus.py		create_corpus.py
fasttxt.py		fasttxt.py
main.py		main.py
playingWithHebModel.ipynb		playingWithHebModel.ipynb
word2vec.py		word2vec.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Word Embedding - Hebrew

The Code for https://www.oreilly.com/learning/capturing-semantic-meanings-using-deep-learning

Note: This code works for Hebrew, but it should work on any other language.

Word2Vec

FastText

Test

About

Releases

Packages

Contributors 3

Languages

liorshk/wordembedding-hebrew

Folders and files

Latest commit

History

Repository files navigation

Word Embedding - Hebrew

The Code for https://www.oreilly.com/learning/capturing-semantic-meanings-using-deep-learning

Note: This code works for Hebrew, but it should work on any other language.

Word2Vec

FastText

Test

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages