Natalie Kudan
What is sentiment analysis and how it works
AI solutions have multiple uses – from pure data science to e-commerce and self-driving vehicles – and they cover a variety of domains, including NLP (Natural Language Processing). Any AI solution has an “ML model” (or “training algorithm”) at its core. This is basically a set of coded instructions from ML engineers that tells that solution how to train itself.
Language models and sentiment analysis algorithms
With NLP-related solutions (for example, a virtual assistant like Siri), ML models are known as “language models.” Some of them are “Large Language Models” or “LLMs” for short. As is the case with all other ML models, alongside instructions from ML engineers, these models also require annotated datasets. Acting as information-packed textbooks, these sets provide hands-on examples and definitions.
After an initial training stage (known as “pre-training”), AI solutions based on language models can be tailored to specific NLP tasks. One of them is known as “sentiment analysis.” This is when an AI solution can understand and describe or rate moods, attitudes, or opinions of any text.
In this article, we’ll look at sentiment analysis in more detail, talk about its types and applications, and also explain how data labeling at Toloka can support AI solutions that carry out sentiment analysis.
NLP and Transformers
Before we can jump into sentiment analysis, we first need to look at language models that enable it. Until about five years ago, language models in ML were quite limited, because they could not understand meaning in context. In linguistics, this is known as “semantics” and “pragmatics.”
Previous language models like GloVe or Word2vec proved useful, but they generated word-embedding representations that were largely context-free. In other words, AI solutions that relied on these models could not understand how different words in the same sentence related to one another in various ways.
This all changed in 2018 when Google put out a groundbreaking language model known as BERT (Bidirectional Encoder Representations from Transformers). This original Transformer model is an encoder-decoder, which means that there are both input and output layers, and every input in this model is connected to every output. As a result, BERT became the first language model that was capable of understanding how words relate to each other in context. For example, let’s look at this sentence:
Jerry threw the ball to his dog, but it bounced away and ended up in his neighbor’s yard.
As humans, we all understand that the pronoun “it” refers to the ball, not Jerry or his dog, because we understand how each word is connected to all other words in this sentence. However, this was a real problem with analyzing human language until recently – machines did not possess the same level of understanding.
BERT made a real breakthrough, because it could get context. Other Transformer models soon followed that improved on BERT, namely RoBERTa (by Facebook AI) and DistilBERT (by Hugging Face), as well as GPT (by Open AI) that took a different approach to the task.
From BERT to a sentiment analysis model
It’s important to understand that even with a Transformer model backing it, an AI application cannot simply go ahead and perform sentiment analysis. Pretrained models like BERT provide a solid foundation for sentiment analysis models – that’s why they’re also known as “foundation” models – but they’re not task-specific. In order for an AI application to carry out specific NLP tasks such as sentiment analysis, further training on task-specific data is required.
BERT is an unsupervised model pretrained on huge volumes of unlabeled text known as “corpora” (singular – “corpus”). What this means essentially is that ML engineers feed a bunch of unlabeled data into the model and let it determine connections and associations between words on its own. What follows is a model that can predict “masked” words. This means that BERT can reliably identify a missing word in a sentence based on other words in that sentence. This is a great start, but it’s insufficient for accurate sentiment analysis.
What is needed to prepare a language model like BERT for sentiment analysis is known in ML as “fine-tuning for downstream applications” or simply “fine-tuning.” During fine-tuning, another layer is added to BERT. This sentiment analysis layer consists of an annotated dataset of texts that trains the model to predict sentiment labels, namely words and phrases that carry particular connotations (ideas, feelings, or subject-specific messages). These datasets can be obtained in several ways, not least through crowdsourcing with platforms like Toloka.
The GPT family
GPT (Generative Pretrained Transformers) is a unidirectional autoregressive decoder with greater quantitative parameters than BERT. This means that (a) texts are always processed in one direction, and (b) this model is much larger.
Compared to GPT, BERT is more easily customizable, and hence it’s arguably more useful to ML engineers for many NLP-specific downstream applications, for example when making their own sentiment analysis solution. However, GPT (being the Goliath of LLMs with 170 billion parameters) tends to outperform BERT out of the box, i.e., when there’s little to no tweaking. As a result, it’s more of a “ready” language model. The same maker (Open AI) has also put out DALL-E that utilizes GPT to generate images from description.
GPT is considered a state-of-the-art model, which means it’s basically as good a standard as we have in the field of NLP right now. Open AI’s latest beta-release, an AI chat backed by GPT 3.5, stunned everyone when the company unveiled it to the public on November 30 this year. Here’s an example:
Toloka: Can you please describe sentiment analysis and how it’s enabled by machine learning and data labeling? Imagine that you’re speaking to someone who doesn’t know anything about the subject. Use no more than 5 sentences.
GPT-powered AI chat: Sure! Sentiment analysis is the process of using natural language processing, text analysis, and machine learning techniques to identify and extract subjective information from text data. Sentiment analysis can be used to determine the overall sentiment of a piece of writing, such as whether it is positive, negative, or neutral. Machine learning and data labeling are used to train algorithms to accurately identify and classify the sentiment of text. This involves providing a large amount of labeled data for the algorithm to learn from, so that it can make predictions about the sentiment of new, unseen text.
This may look like a generic response at first glance. It’s not. You can ask GPT 3.5 pretty much any question, and it’ll respond accordingly:
Toloka: Write a very short story or a joke about pizza and Bach.
GPT-powered AI chat: Bach was a great composer, but he had one weakness – he loved pizza. Whenever he sat down to write a new masterpiece, he couldn’t resist ordering a large pepperoni pizza to help him think. His friends laughed and called him “Pizza Bach,” but he didn’t mind. He knew that his love of pizza only made his music more delicious!
Yes, we’ve clearly come a long way since the times of simple chatbots!
GPT and learning
Similarly to BERT, all language models from the GPT family are also pre-trained on huge volumes of text (corpora). However, GPT’s masked prediction logic is based on prompts. In all prompt engineering, task descriptions are embedded within the input, which means that GPT models can autonomously tune themselves as we’ll see later on.
Let’s say we have this post on Twitter:
I totally hated the new version of my favorite show!
Or we could have a significantly more challenging tweet to understand, because it’s a compound sentence with a double negative in the first part.
I didn’t not like the play, but I wouldn’t necessarily go out of my way to see it again.
For a pre-trained GPT-type language model to deduce meaning in these sentences (especially the second one) and successfully perform sentiment analysis, it has to evolve further through one of the two processes.
One is called “p-tuning” (short for “prompt-tuning”), which is a type of fine-tuning for GPT-like models. Just like with BERT, this requires labeled data. And again, just like with BERT, this is when platforms like Toloka can provide annotated datasets to those seeking to fine-tune their AI applications.
Another way – which has now become even more popular with GPT models – is to use zero-shot, one-shot, or few-shot learning. The idea is to basically get the model to work out sentiment labels, but provide it with as few examples as possible. This is what is meant by “autonomous tuning” that we mentioned earlier.
This route doesn’t require any labeled data before deployment, that is, before a GPT-backed AI application is implemented. And strictly speaking, this isn’t learning as such – it’s more like putting examples to existing prompts. This means that rather than needing labeled datasets to hone its specific NLP skills for a downstream application, instead the model attempts to find certain patterns that already exist within billions of texts it already obtained during pre-training:
Zero-shot learning (no examples)
Input sentence: He can’t stand licorice.
Probable sentiment provided by the model: Negative.
One-shot learning (one example)
Input sentence 1: He likes red licorice.
Input sentiment 1: Positive.
Input sentence 2: He can’t stand black licorice.
Probable sentiment 2 provided by the model: Negative.
Few-shot learning (more than one example)
Input sentence 1: He likes red licorice.
Input sentiment 1: Positive.
Input sentence 2: He can’t stand black licorice.
Input sentiment 2: Negative.
Input sentence 3: I like detective stories.
Input sentiment 3: Positive.
Input sentence 4: I don’t like romance novels.
Probable sentiment 4 provided by the model: Negative.
Sentiment analysis types
Once this type of learning has been achieved with GPT or fine-tuning using annotated data has been carried out with BERT, an AI application should be able to perform sentiment analysis. Below are some of the most common sentiment analysis types. Let’s remember that similar tasks are also undertaken by data labelers who provide annotated data during training. These tasks are also tackled during evaluation of AI applications after deployment, which we’ll talk about later.
Sentiment detection (polarity analysis)
This is one of the most common types of sentiment analysis. Polarity analysis is when an AI solution needs to determine whether the mood of a given text is negative, neutral, or positive. Often, there’s a corresponding scale of sentiment classification with smileys or emojis:
Coffee with full-cream milk is revolting. [negative sentiment 🙁]
This shop offers several types of coffee machines. [neutral sentiment😐]
I adore frappuccino with almond milk. [positive sentiment 🙂]
Subjectivity/objectivity detection
During this sentiment analysis task, an AI solution needs to determine whether a sentence or a piece of text is objective, meaning that it contains facts about a particular subject, or it’s subjective, meaning that it conveys someone’s opinion:
Turns out many people residing in California do not know that the Golden State’s capital is not San Francisco, San Diego, or even Los Angeles – it’s actually Sacramento. [objective]
Putting out this inferior device without proper market research was poor decision-making on the part of the executive committee. [subjective]
Emotion detection
This is when an AI solution has to find an emotion present in a sentence or a piece of text. Different downstream applications may have different options to choose from, for instance:
We finally got the new console – yes!!! [excitement]
The news about his sudden departure was disconcerting. [worry/sadness]
I will not go anywhere near this overpriced hellhole ever again! [anger]
Intensity ranking (degrees of emotion)
In this case, an AI solution needs to isolate an emotion and determine how strongly it is expressed (sentiment score). Just like with emotion detection, different scales of positive or negative sentiments may be used, for example:
You will need two to three eggs to make this dish. [none]
Our day at the sea was nice. [moderate – happiness/satisfaction]
I absolutely can’t stand their Park Street branch – there’s always a huge line and only one teller. [strong – frustration/dissatisfaction]
Subject matter detection
This is when an AI solution needs to analyze a sentence or a piece of text to extract its gist or identify a specific topic, for instance:
Aruba is a perfect getaway for anyone looking to take a break from their routine. [tourism/description of a destination]
You will have an hour-long consultation with our anesthesiologist the day before your scheduled surgery. [preparation for a medical procedure]
Our app allows you to browse profiles and meet people in your area. [a dating service ad]
Feature-based analysis
This a more fine-grained analysis that’s meant to determine opinions about smaller parts of larger entities, such as statements about features of a service or components of a product:
Although the latest model of the X smartphone has great new features, the phone’s battery life leaves much to be desired. [dissatisfaction with the new model’s battery]
The restaurant offers a vibrant menu, and the service is quick, but there’s no parking at all anywhere in the vicinity of the establishment. [dissatisfaction with a parking situation]
My favorite part about these bonus points is that you can actually exchange them for cash! [delight related to a cash-for-bonus scheme]
Sometimes it's also called aspect based sentiment analysis. It can be used to perform opinion mining on certain topics, to analyze customer feedback on online review sites, to keep an eye on news articles, for social media monitoring, and so on.
Intent analysis
This analysis is about determining whether the author has a particular goal in mind, such as making a potential purchase:
We were talking among friends and had a disagreement over the best brand of electric guitar. [no purchase intent]
I’d like to join a gym when I move to Miami in the fall. [purchase intent: long term]
Can anyone recommend a pair of decent earphones under $100? I’ve just lost mine. [purchase intent: short term]
Sentiment analysis applications
A sentiment analysis system today has a variety of applications in both research and business. Among some of them are the following.
Recommender systems and targeted ads
Recommender systems are basically suggestion filters that e-commerce stores or online marketplaces rely on every day. These places (like Amazon, eBay, or Alibaba) sell products on the web, and they need recommender systems to grow their revenues. Why? Because we as customers need to get the right offer at the right time, meaning that these offers should result in a purchase.
A trained machine learning model capable of sentiment analysis work can improve recommender systems. This is achieved by studying e-commerce users: their profiles, their comments, their reviews, and their questions. During sentiment analysis, our explicit (and sometimes implicit) preferences are extracted. As a result, recommender systems offer us exactly what we’re looking for, and also something we’re likely to consider even when we aren’t necessarily actively searching for a product (which is known as “targeted ads”).
Content moderation
Sentiment analysis tools for content moderation are used for social media platforms (like Facebook or Discord), media platforms (like YouTube, Vimeo, or Soundcloud), and blogging/microblogging websites (like Twitter). It also applies to more specialized communities for industry-specific professionals.
Performing sentiment analysis of social media data automatically can be of tremendous help to moderators from these places who don’t have to process large volumes of text by hand. Instead, a sentiment-analysis-enabled NLP model can flag any messages that either don’t fit their description category (for example, a post about dentistry made in a thread for neurologists) or violate community guidelines (foul language, sensitive topic, harassment, and so on).
Brand reputation monitoring
At its core, brand monitoring is quite similar to content moderation, except that rather than moving misposted messages or removing inappropriate ones, companies that perform ML-backed sentiment analysis target customer sentiment. They analyze positive and negative words used by their customers to find out what the customers are saying about the brand and why.
This way, a sentiment analysis tool allows businesses to better understand their customer base, learn from their mistakes, find actionable insights, and improve their products. Some businesses may go as far as actually removing bad reviews, though this is not always possible, and it’s obviously frowned upon. But what an affected business can realistically do instead is try to set things right with their disgruntled customer. This could be achieved by first identifying that person and then offering them a better version of the product, a gift certificate, extended warranty, etc. Analyzing positive sentiment helps in analyzing, highlighting and thanking customers for their positive feedback.
Customer support (tickets and queries)
This is another common application of an ML-enabled sentiment analysis solution that assists businesses the world over, both big and small. These days, companies often utilize tickets when they talk to their customers. This means that every customer who reaches out with a question is assigned a ticket number, and that ticket remains open until the issue has been dealt with.
Applying sentiment analysis allows businesses to quickly examine their incoming tickets and identify a specific subject matter in each customer’s query. This way, companies can learn right off the bat how each customer is feeling, what they want, and what has to be done to solve their problem – all without going through their query pool manually. Importantly, based on this information, businesses can also assign the right employee to deal with every individual case.
Market and academic research
Research is vital in business and academia, and sentiment analysis has brought new opportunities to both fields.
Through market research, companies can learn about their future customers and specific industries by analyzing what people are saying about similar products and competing businesses. Namely, companies can turn to analyzing sentiment to discover their future clients’ pain points, which are basically unmet needs. This is crucial before putting out a new product or launching a new service, because businesses need to establish and verify what’s known as “product-market fit,” i.e., whether what they have in mind corresponds to a strong demand in the marketplace.
In academia, namely in computational linguistics, psychology, sociology, and political science, sentiment analysis via ML allows researchers to study “metadata” (large-scale experimental data) and identify certain phenomena whose identification would ordinarily take immeasurably more time. Insights into categories of language usage, attitudes towards different social groups or controversial issues, human emotions under varying conditions, and even prediction of election outcomes (i.e., popularity polls) are now much easier to conduct, thanks to sentiment analysis.
Training data and sentiment analysis evaluation
Now that we know what sentiment analysis is, how it works, and what language models are behind them, it’s time to have a look at the training data that’s fed into these models in more detail.
As we’ve already explained in other posts in this blog, labeled/annotated data can be obtained through different approaches and methodologies. Often, as is the case with Toloka, the data is labeled by crowdsourced human annotators (or as we say “crowd contributors”), which is why it’s referred to as “human-labeled data” or “human-handled labeling”.
One of the advantages of crowdsourced data labeling is that the process is much faster compared to other scenarios, and it’s also much more affordable. Sometimes, AI application developers may choose to use ready sets of annotated data when they prepare their pre-trained ML models for specific NLP tasks. However, in most cases, these datasets are very generic in terms of task specificity and currentness, which may result in poorly performing AI applications.
Since sentiment analysis implies absolute precision, all training data for downstream applications needs to be of high quality. It’s therefore recommended that the freshest and most up-to-date annotated data be used during fine-tuning and p-tuning of language models, ideally provided by human labelers.
In addition, AI applications need to be evaluated using high-quality annotated data even after fine-tuning. When this evaluation is performed by human labelers, this is known as “human-in-the-loop monitoring.” This monitoring is necessary to avoid what is known as “data drift.” Since languages and hence language data are not static, NLP-based AI solutions need to be consistently checked for their modernity. For example:
I think Mike has become a total badass.
Old interpretation / Negative: Mike is not a nice person to be around.
New interpretation / Positive: Mike is very cool.
The word “badass” now has a new informal meaning, and while this is just a simple example, when datasets contain outdated information about terms and their interpretations (i.e., data drift), sentiment analysis is affected accordingly. And so is every business that relies on it. At the same time, it’s also important to remember that during evaluation/monitoring, different datasets are needed – they cannot be the same datasets that were used for model training.
During human-in-the-loop monitoring and sentiment analysis evaluation, crowd contributors may work with fresh data in two different ways. The first way is to give new texts to the trained model and get it to perform sentiment analysis. The same texts are given to human annotators who also carry out sentiment analysis. The annotators’ answers are considered “golden sets” (i.e., correct answers), which is why they’re then compared to the answers provided by the ML model to determine how accurate that solution is.
The second way is to give the model’s sentiment analysis answers (along with any corresponding input) to human annotators and ask them to rate the model’s performance. Both ways work equally well; however, the first strategy can be utilized to both evaluate and later fine-tune the model if need be, whereas the second strategy can be used only for evaluation but not fine-tuning (though it entails a shorter route).
Concluding remarks
As we’ve seen, sentiment analysis has a wide range of useful applications in business and academia that are now gaining more and more traction. This is made possible by pre-trained language models such as BERT and GPT (that provide foundation), as well as annotated data.
Annotated data plays an important role in the fine-tuning/p-tuning of foundation models, that is, preparing these models for task-specific downstream applications such as sentiment analysis. Annotated data is also a must-have in sentiment analysis evaluation (“human-in-the-loop monitoring”) of AI applications after their deployment.
Whether you need to integrate an AI solution that can carry out sentiment analysis (for your own business or as a service to other businesses) and/or you need to evaluate how up-to-date your model is (i.e., there’s no data drift), you can always count on Toloka for effective and affordable human-handled data labeling.
Check out Toloka’s solution for data labeling for natural language processing that provides a list of NLP-related data annotation tasks, including sentiment analysis, that our crowd contributors perform on a daily basis. We also encourage you to have a look at this step-by-step guide on how to set up a sentiment analysis/content moderation project on Toloka.
Article written by:
Natalie Kudan
Updated:
Dec 15, 2022