Giuseppe Bonaccorso - www.bonaccorso.eu
News datasets (raw and preprocessed) can be downloaded from Insight Project Resources website
Requirements: Scikit-Learn, NLTK, Gensim, Keras (with Theano or Tensorflow)
Theoricatical references:
- D. Greene and P. Cunningham. "Practical Solutions to the Problem of Diagonal Dominance in Kernel Document Clustering", Proc. ICML 2006
- Metsis, Vangelis, Ion Androutsopoulos, and Georgios Paliouras. “Spam Filtering with Naive Bayes-Which Naive Bayes?”, CEAS, 27–28, 2006
- Zhang, Harry. “The Optimality of Naive Bayes.”, AA 1, no. 2 (2004): 3
- Le, Quoc V., and Tomas Mikolov. “Distributed Representations of Sentences and Documents”, ICML, 14:1188–1196, 2014
- Mikolov, Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean. “Efficient Estimation of Word Representations in Vector Space”, arXiv Preprint arXiv:1301.3781, 2013
- Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. “Distributed Representations of Words and Phrases and Their Compositionality”, Advances in Neural Information Processing Systems, 3111–3119, 2013
- Bonaccorso G., Reuters-21578-Classification using Word2Vec and LSTM
- Bonaccorso G., Twitter Sentiment Analysis with Gensim Word2Vec and Keras Convolutional Networks