-
Notifications
You must be signed in to change notification settings - Fork 1
Knowledge Center
This is a page to explain the key concepts in the package and how aspects and opinions are extracted.
A review dataset is mostly like this:
We can see the level of granularity:
Review data → comment by a single customer → sentences in a comment
Here we assume that each customer only comments once. “Comment by a single customer” is an important level when we do sentiment analysis. Every customer has an overall attitude towards the product. Maybe they feel happy about one aspect but not satisfied with another but they still love to use the product, then it’s a positive feedback (but the sentiment score will be lower than that of the customer who has no complaints). Some customers writes a lot and some customers only leave one sentence. If we simply calculate sentiment score on each sentence regardless of who made the sentence, the sentiment from the customer who writes a lot will be over-weighted and the sentiment from the other type will be under weighted.
“Sentences in a comment” is important when we do analysis on aspects. We will talk about aspects later. But imagine we would like to know a specific aspect of a product. Usually the customers will mention an aspect for only one time in their reviews. We just break down the comments to sentences to make it easier.
Aspect is the attributes of the product that will affect the customers’ attitudes towards the product. For a hotel, aspects can be “bed”, “breakfast”, “location”, “staff”... For a restaurant, aspect can be “vibe”, “food”, “staff”, “music”...
Opinions are the words customers used to describe the aspects. They are associated with aspects.
- How we extract aspects?
We extract the noun phrases and nouns from the sentences.
- How we extract opinions?
We extract aspects and opinions on sentence level and merge the aspects and opinions from all sentences. Within a sentence, for an aspect, we extract opinions in this way:
This is how we generate aspects and opinions for a sentence. The output is a dictionary. We do it for each sentence and merge the dictionaries. In the end we get a large dictionary with all the aspects and opinions. For each aspect, the “opinion” will be a collection of all the opinion words.
There are definitely other ways to extract aspects and opinions like topic modeling and supervised learning approach. We want to offer a more secure and controllable approach so that the users can get meaningful insights without taking the effort to learn and tune the algorithm. There are definitely some edge cases that are yet to be caught and in the future we will add more rules to make sure it works well for more and more scenarios.
Sentiment score shows how positively or negatively a customer reviews the product/service, usually ranging from -1 to 1. We are interested in two kinds of sentiment: sentiment for comment and sentiment for aspects.
- Sentiment for a sentence
Using functions in TextBlob
, this is polarity * subjectivity
.
- Sentiment for a comment
This is the average sentiment score for the sentences that have none zero sentiment scores.
- Sentiment for an aspect
First calculate the sentiment for each of the opinion words. Then calculate the average sentiment score for the words that have none zero sentiment scores.