Text mining and sentiment analysis

Natalie Kudan

Subscribe to Toloka News

Subscribe to Toloka News

A few years ago, data and big data became buzzwords. Far less clear was how to process that data, making it hard to predict how much insight could be gleaned from it.

Things have changed dramatically since then. Think about how often companies' customer service or products are mentioned online – social media, news articles, blog posts, online reviews, forums, internal and external emails, product marketing collateral, public relations content, presentations and other documents. That information can then be mined for business purposes and text analytics, taking decision-making to the next level.

But this is where the problem starts. While a lot of data is produced and downloaded every day, the majority of it is unstructured. It doesn't fit into clearly defined frameworks, which makes it hard to process and get useful insights.

The good news is that there are now tools like text mining and sentiment analysis that make it possible to automate the process of collecting, organizing, and analyzing massive amounts of business-related data. In this article, we will talk about text mining techniques, sentiment analysis types, why they are important, and how to apply them to gain actionable business insights.

Need human-labeled data for your ML project?

Check out Toloka’s data labeling platform.

  • Global crowd: 40+ languages, 100+ countries
  • Any data type: text, image, video, audio, and more
Talk to us

Text mining vs. sentiment analysis: why do we need both?

If we deal with unstructured data, it first has to be structured before businesses can use it. Text mining, or text data mining, is the process of extracting relevant information and actionable insights from text to help businesses make better decisions.

Sentiment analysis, also known as opinion mining, uses machine learning (ML) and natural language processing (NLP) to categorize text information as positive, negative, neutral, or beyond.

Customers who visit a business's website to learn more about its offerings or make purchases leave a trail behind them with every click they make. Since that information can help determine the success or failure of business moves, assist in developing a data-driven strategy, improve the user experience, and more, it would be foolish to ignore it. Let's break down the process.

The first step is to simply collect the information. Once that's done, text mining can be used to sort through and categorize it. Only then should sentiment analysis be leveraged to delve into deeper meaning and nuance.

Sentiment analysis in text mining can thus be applied to massive pools of data to identify subjective information such as customers' opinions and feelings. That helps with product development, improving customer experience (CX), increasing operational efficiency, and simplifying data analysis in general.

For example, customer sentiment in social media posts might include the following:

Positive words like "great customer service" or "convenient location" represent positive sentiment, and add to the positive sentiment score.

Negative words like "awful", "unfriendly" or "unwelcoming" in customer data add to negative sentiment scores. Sarcasm will also fall into this category, although analyzing it is a much more complicated task in text analytics.

Sometimes, words like "average" or "fine" are counted as neutral words and add to neutral sentiment. This happens when a more complicated scale is used, not just the binary one consisting only of positive or negative scores.

When performing sentiment analysis, developers usually create their own sentiment lexicon which contains information about which words and phrases are considered positive or negative. Of course, this is an oversimplification, and in ML-based solutions, the logic used under the hood is usually much more complicated than that. ML models are trained on massive amounts of carefully selected training data, so that they are able to perform complicated tasks like key phrase extraction, feature extraction, analyzing unstructured text, taking into account word frequency and defining overall sentiment to provide valuable insights.

An excellent illustration is when a company uses sentiment analysis on customer feedback to learn more about what their existing or potential customers think of their products and services. They might use surveys, emails, social media, news sites, or feeds. We'll take a closer look at some cases later.

What about aspect based sentiment analysis?

There's also a special kind of sentiment analysis called aspect-based sentiment analysis. It allows businesses to analyze customer feedback not strictly based on the sentiment, but rather by different aspects of a product or service. Here's the difference:

  • Sentiments represent what customers feel, i.e. positive, neutral or negative opinions (sentiment scores).
  • Aspects represent specific categories, features, or topics mentioned by customers.

What technologies are at the heart of text mining and sentiment analysis?

In this section, we will look at the core technologies behind sentiment analysis in text mining and learn more about how they work.

Artificial intelligence (AI) and machine learning

ML is a subfield of AI and computer science that simulates human learning. It uses data and algorithms without explicit programming to improve performance on a set of tasks. With that in mind, the heart of any AI solution is an ML model, or training algorithm, that is essentially a set of coded instructions from ML engineers telling the solution how to train itself.


The subfield of ML and AI that allows computers to understand text and spoken words in the same manner as humans is known as NLP. It allows computers to process text or voice data with clarity by combining computational linguistics with statistical, machine learning, and deep learning models.

Until about five years ago, language models in ML couldn't understand meaning in context, or how different words in the same sentence relate to one another. They could provide a solid foundation for sentiment analysis models, but they were not task-specific. Now, in order for an AI application to carry out specific NLP tasks such as sentiment analysis, it needs further training on task-specific data. Let's see how that works.

How to perform sentiment analysis

Natural language processing algorithms require annotated datasets that serve as information-packed textbooks with practical examples and definitions. But that means ML engineers have to feed the model a large amount of labeled data, training it to perform text processing and draw conclusions about relationships between words and their connotations. Typical tasks in text-based data labeling might include language detection, part of speech tagging, named entity recognition, and so on.

This layer of annotated datasets is used to fine-tune the model so it can accurately predict sentiment labels – words and phrases that carry particular ideas, feelings, or topic-specific messages. The datasets can be obtained in a variety of ways, including crowdsourcing on platforms such as Toloka.

After this phase of training is complete, NLP tasks like sentiment analysis can be specifically tailored to the AI solution's language model. Sentiment analysis on text data is a type of ML-based NLP tool. It was trained to analyze texts for their emotional and other polarities, from positive to negative, meaning it could detect them autonomously.

As a result, models used for sentiment analysis can be taught to go beyond literal meanings by considering nuances such as tone, sarcasm, and misuse.

How can text data benefit businesses?

There are undoubtedly more beneficial ways to use sentiment analysis to gain data-driven insights for businesses than what we will cover in this article. We do, however, hope that the examples we provide will help you better understand what sentiment analysis and text mining are capable of.

Moderating content

Sentiment analysis performed on Facebook, YouTube, Twitter, and SoundCloud, as well as in more niche communities for industry professionals, helps identify posts that don't adhere to the platform's guidelines. As this is an automated process, it can be a huge time saver for moderators who would otherwise have to read through large volumes of text data.

Monitoring brand reputation

Brand monitoring is similar to content moderation in many ways. Instead of simply blocking inappropriate or irrelevant information, however, businesses use sentiment analysis to focus on how customers feel about their brand and track changes in perception over time. They do this by parsing customer reviews for both positive and negative sentiments in order to better understand what customers like and dislike.

You can also use comparative brand analysis to keep tabs on industry developments. Another option is to learn whether a given advertising campaign is yielding positive results or how consumers are responding to a newly released product.

Gaining insights from customer reviews

This is also related to brand reputation. With that said, it focuses solely on customer feedback and reviews derived from survey responses, customer service data, blogs, social media, and elsewhere.

With sentiment analysis capable of processing thousands of pieces of feedback in just minutes much more accurately than humans ever could, companies can identify areas for growth. They can enhance existing offerings and even reward customers for their feedback, both positive and negative.

Recommendation systems and targeted ads

Recommendation systems are basically suggestion filters that e-commerce stores or online marketplaces such as Amazon or eBay rely on every day. How do they work? Sentiment analysis algorithms analyze users' explicit and implicit product preferences to learn about their tastes, driving timely, relevant product suggestions. In other words, the right offer at the right time.

"We had two goals: get high-quality data we could use to train the recommendation system for our e-commerce platform and measure the accuracy of our current recommendation algorithm. Toloka helped us improve our model with super-fast labeling of tens of thousands of products from our store. Toloka makes the data problem easier so we can focus on our algorithms".

– Ivan Lapitsky, Project Manager, Yandex Market

Prioritizing customer issues in tickets and queries

Sentiment analysis is also commonly used in customer service and support. Incoming chatbots, emails, phone calls, and web queries are analyzed, sorted according to topic and urgency, and forwarded to the appropriate department or employee.

Staff members don't have to manually search through query pools, while sentiment analysis still ensures that the most urgent problems are resolved right away.

Conducting market research

Before releasing a product or launching a service, businesses usually conduct extensive market research to determine if there is sufficiently strong demand for their proposed offering. This is where sentiment analysis comes in handy.

By collecting data on customer feedback about competing products and services, companies can gain valuable insight into their target market's pain points and unfulfilled needs. Their product offerings are then hyper-relevant to their target market.

Wrapping up

Data collection and analysis can be daunting tasks, but they are essential if a company hopes to maintain its position as a market leader.

As we've seen, there's a powerful solution: text mining combined with sentiment analysis. Data analysis leads to dynamic insights regardless of the type of text you need to analyze and without the months-long slog of manual review.

Article written by:
Natalie Kudan

Recent articles

Have a data labeling project?

Take advantage of Toloka technologies. Chat with our expert to learn how to get reliable training data for machine learning at any scale.