A single-layer neural network written from scratch that predicts the language of the text based on the proportions of letters.
Articles scraped from Wikipedia. The neural network was trained on 5 languages (English, German, Polish, Spanish and French), 10 articles each. However, the code is generic and can be applied to any number of languages.
English sample:
German sample:
Polish sample:
- Dynamic reading of multiple data files from a various number of directories
- Calculating vectors of letter proportions in each language (ASCII characters only)
- Implementing a single-layer neural network of k perceptrons (where k = number of languages) from scratch
- Libraries used for data loading and preprocessing: pandas, numpy
- OOP and clean code
- Unit testing with pytest