ExplainerNo. 57

How Sentiment Analysis Works

RelayMagJune 20267 min read

Key takeaways

Sentiment analysis turns feeling into a number across three waves of methods.
Lexicons are fast and traceable but brittle, while transformers read context yet hide their reasoning.
Brand-level scores inherit volume, sampling, and language skews, so read the text behind them.

Sentiment analysis is the attempt to turn human feeling into a number. Software reads a review, a tweet, a support ticket, or a survey answer and tries to decide whether the writer feels positive, negative, or neutral. The promise is appealing. Instead of reading ten thousand reviews by hand, you get a chart. The reality is messier, and understanding how these systems actually work is the best protection against trusting them too much.

There is no single method. The field has gone through three broad waves, each one better at handling language and each one harder to see inside. Knowing how they differ helps you read any sentiment score with the right amount of doubt.

Lexicon and rule-based methods

The oldest approach is also the easiest to explain. You start with a dictionary of words, each one scored as positive or negative. Words like "excellent" and "love" carry a positive weight. Words like "broken" and "awful" carry a negative one. To score a piece of text, the software finds the scored words, adds up the weights, and reports the total.

On its own that is too blunt, so these systems add rules. A negation rule flips the score when it sees "not good" or "never works". An intensity rule boosts the weight when it sees "very" or "incredibly", and softens it for "slightly" or "somewhat". Some lexicons built for social media even score punctuation and capitalization, reading "GREAT!!!" as stronger than "great".

The appeal here is speed and transparency. These systems run fast, need no training data, and you can always trace exactly why a sentence got the score it did. That makes them easy to audit and easy to fix. The weakness is that they are brittle. A word means different things in different settings, and a fixed dictionary cannot keep up. Anything the lexicon has never seen, it simply ignores.

Classic machine-learning classifiers

The next wave handed the problem to the data. Instead of writing rules by hand, you collect a large set of examples that people have already labeled as positive, negative, or neutral. A model studies those examples and learns which patterns of words tend to go with which label. Once trained, it predicts a label for text it has never seen.

These classifiers represent text as numbers, often by counting which words appear and how often, sometimes weighting rarer and more telling words more heavily. The model learns from the statistics of the training set rather than from a human-written dictionary. That makes it more flexible than a lexicon, because it can pick up patterns a person might never think to encode.

The cost is the training data itself. Someone has to label thousands of examples, and the model only learns the world those examples describe. Train a classifier on movie reviews and point it at medical feedback, and the accuracy can fall off a cliff. The model also struggles with word order. To a simple counting model, "good, not bad" and "bad, not good" can look nearly identical, because it sees the same bag of words in both.

Transformers and large language models

The current wave reads context. Modern models, built on an architecture called the transformer, process a sentence as a whole rather than as a pile of separate words. They learn from enormous amounts of text how words relate to one another, so they can tell that "sick" is an insult in one sentence and high praise in another. Large language models take this further and can be asked, in plain instructions, to judge the sentiment of a passage and explain their reasoning.

This is the most capable approach by a wide margin. It handles nuance, longer arguments, and shifts in tone that would defeat the earlier methods. It also comes with real tradeoffs. These models are expensive to run at scale, slower than a lexicon, and far harder to inspect. When one of them labels a sentence as negative, there is rarely a clean, traceable reason you can point to. You are trading transparency for understanding.

Aspect-based sentiment analysis

A single score for a whole review throws away most of what the review says. Consider "great food, slow service". A blunt system averages those into something like neutral, which is true of nothing the customer actually felt.

Aspect-based sentiment analysis fixes this by splitting the text into the things being talked about and scoring each one separately. The same sentence becomes positive on food and negative on service. For a restaurant, a hotel, or a piece of software, this is usually the useful view, because it tells you what to fix rather than handing you one flattened average. It is also harder to do well, since the software has to find the aspects, decide which words apply to which aspect, and only then assign a sentiment to each.

Why this is genuinely hard

None of these methods is close to solved, and the reasons are baked into language itself.

Sarcasm and irony invert meaning. "Oh, perfect, another outage" is bitter, but every individual word reads as calm or positive.
Negation hides in long sentences. The "not" that flips a clause can sit far from the word it cancels, and short-context methods miss it.
Comparisons confuse the target. "Their old phone was better than this one" is negative about the new product even though the positive word attaches to the old one.
Domain language breaks dictionaries. In some communities "sick" and "wicked" are compliments, and a general-purpose lexicon scores them as harm.
Mixed sentiment lives in one sentence. Real feedback often praises and complains in the same breath, which is exactly why aspect-based methods exist.
Emoji and slang carry a lot of the signal, and they shift fast. A symbol that read as friendly last year can read as mocking now.
Neutral is genuinely hard. Plenty of text is factual or off-topic, and many systems are quietly bad at telling calm neutrality apart from weak positive or weak negative.

These are not edge cases you can ignore. They show up constantly in real text, and they are a large part of why published accuracy figures vary so widely from one dataset and domain to the next. Treat any single headline accuracy number with caution.

Rolling scores up to a brand number

Most people never look at individual scores. They look at a dashboard that says sentiment is 72 this month, up from 68. That number is an aggregate, usually an average or a share of positive versus negative mentions, and the way it is built hides several traps.

Volume bias. A loud week of mentions about one incident can swing the brand number even if nothing about the underlying product changed.
Sampling bias. People with strong feelings, good or bad, are the ones who write reviews and posts. The quiet, satisfied majority is underrepresented, so the average leans toward the extremes.
Language and region gaps. Many tools work best in English and weaker elsewhere, so a global score can quietly reflect only the markets the software reads well. Whole regions can be missing from a number that claims to cover everyone.

A brand-level score is only as honest as the text feeding it and the model reading that text. When the inputs are skewed, the tidy number on the dashboard inherits every one of those skews without showing them.

How to read a sentiment score

The healthy way to use sentiment analysis is to treat it as a compass, not a verdict. A score is good at telling you which direction things are moving and where to look next. It is not ground truth about how your customers feel.

A few habits keep you honest. Watch changes over time rather than fixating on any single figure, since a trend is more trustworthy than a snapshot. Always check the volume behind a number, because a swing built on a handful of mentions means little. When something moves, go read the actual text, because the words explain the shift in a way the score never can. And stay aware of what is missing, including the languages, regions, and quiet customers the system never captured.

Used that way, sentiment analysis earns its keep. It points you at the reviews worth reading and the problems worth chasing. The mistake is asking it to be the answer instead of the place you start looking.

RelayMag is an independent publication on marketing, search, and how companies get found.

How Sentiment Analysis Works

Lexicon and rule-based methods

Classic machine-learning classifiers

Transformers and large language models

Aspect-based sentiment analysis

Why this is genuinely hard

Rolling scores up to a brand number

How to read a sentiment score

More from RelayMag

The Best Marketing Analytics Tools in 2026

What a Good Conversion Rate Actually Looks Like

A Field Guide to Marketing Metrics