SenTop combines sentiment analysis and topic modeling into a single capability allowing for sentiments to be derived per topic and for topics to be derived per sentiment.
To install with pypi, use:
pip install sentop
Create a SenTop object and pass your list of documents to run_analysis()
.
st = SenTop()
st.run_analysis(docs, annotation="My dataset")
Sentiment analysis is performed using AdaptNLP with state-of-the-art (SOTA) Hugging Face Transformers. SenTop provides multiple sentiment analyses (confidence scores also available):
- RoBERTa Base Sentiment for 3-class polarity -- based on Facebook AI's RoBERTa
- BERT Base Multilingual Uncased Sentiment for 5-class polarity -- based on Google's Bidirectional Encoder Representations from Transformers (BERT)
- Twitter roBERTa-base for Emotion Recognition for 4-class emotion recognition
- BERT-base-cased Geomotions (Original) for 28-class emotion recognition
- Twitter roBERTa-base for Offensive Language Identification for 2-class offensive language recognition
SenTop provides two types of topic modeling: Latent Dirichlet Allocation (LDA) using Tomotopy and transformer-based BERTopic. While LDA provides de facto, statistical-based topic modeling, BERTopic provides SOTA-level performance using Hugging Face Transformers. Transformers that have been tested include:
- BERT Base Uncased -- based on Google's Bidirectional Encoder Representations from Transformers (BERT)
- XLM RoBERTa Base -- based on XLM-RoBERTa
SenTop combines sentiment analysis and topic modeling by copmuting both at the document level for a corpus, the results of which can then be represented by a table as shown below.
Document | BERT Topic | LDA Topic | 3-Class Sentiment | 5-Class Sentiment |
---|---|---|---|---|
"Having to report to work without being provided PPE." | 3 | 0 | negative | 1_star |
"Teleworking at home." | 1 | 2 | neutral | 3_stars |
"Things are good. Im ready to do the mission." | 2 | 1 | positive | 4_stars |