Here, I explore some machine learning techniques for handling text data.
- Literature
My research work at this point is in graph-based networks for fact verification rather than sentiment analysis, so I decided to look through the relevant literature and familiarize myself.
The papers are documented in "literature".
- Exploration and development of models
Using a Jupyter Notebook to document my findings and progress, I clean and explore the dataset and implement a simple logistic regresison model using TF-IDF to encode the text, and a simple pre-trained BERT model for comparison.
Both models have decent performance, with classification accuracy, precision, recall and F1-scores of about 90%. These metrics have been chosen to give a holistic picture of classification performance.
To get this up and running easily, I recommend installing Anaconda.