Skip to content
/ sentop Public

SenTop combines sentiment analysis and topic modeling into a single capability allowing for sentiments to be derived per topic and for topics to be derived per sentiment.

License

Notifications You must be signed in to change notification settings

dhs-gov/sentop

Repository files navigation

SenTop

Python 3.8 Python 3.8

SenTop combines sentiment analysis and topic modeling into a single capability allowing for sentiments to be derived per topic and for topics to be derived per sentiment.

Installation

To install with pypi, use:

pip install sentop

Quick Start

Create a SenTop object and pass your list of documents to run_analysis().

st = SenTop()
st.run_analysis(docs, annotation="My dataset")

Sentiment Analysis

Sentiment analysis is performed using AdaptNLP with state-of-the-art (SOTA) Hugging Face Transformers. SenTop provides multiple sentiment analyses (confidence scores also available):

  1. RoBERTa Base Sentiment for 3-class polarity -- based on Facebook AI's RoBERTa
  2. BERT Base Multilingual Uncased Sentiment for 5-class polarity -- based on Google's Bidirectional Encoder Representations from Transformers (BERT)
  3. Twitter roBERTa-base for Emotion Recognition for 4-class emotion recognition
  4. BERT-base-cased Geomotions (Original) for 28-class emotion recognition
  5. Twitter roBERTa-base for Offensive Language Identification for 2-class offensive language recognition

Topic Modeling

SenTop provides two types of topic modeling: Latent Dirichlet Allocation (LDA) using Tomotopy and transformer-based BERTopic. While LDA provides de facto, statistical-based topic modeling, BERTopic provides SOTA-level performance using Hugging Face Transformers. Transformers that have been tested include:

  1. BERT Base Uncased -- based on Google's Bidirectional Encoder Representations from Transformers (BERT)
  2. XLM RoBERTa Base -- based on XLM-RoBERTa

Combining Sentiment Analysis and Topic Modeling

SenTop combines sentiment analysis and topic modeling by copmuting both at the document level for a corpus, the results of which can then be represented by a table as shown below.

Document BERT Topic LDA Topic 3-Class Sentiment 5-Class Sentiment
"Having to report to work without being provided PPE." 3 0 negative 1_star
"Teleworking at home." 1 2 neutral 3_stars
"Things are good. Im ready to do the mission." 2 1 positive 4_stars

About

SenTop combines sentiment analysis and topic modeling into a single capability allowing for sentiments to be derived per topic and for topics to be derived per sentiment.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages