AI might now be as good as humans at detecting emotion, political leaning and sarcasm in online conversations

Oatawa / Shutterstock

When we write something to another person, over email or perhaps on social media, we may not state things directly, but our words may instead convey a latent meaning – an underlying subtext. We also often hope that this meaning will come through to the reader.

But what happens if an artificial intelligence (AI) system is at the other end, rather than a person? Can AI, especially conversational AI, understand the latent meaning in our text? And if so, what does this mean for us?

Latent content analysis is an area of study concerned with uncovering the deeper meanings, sentiments and subtleties embedded in text. For example, this type of analysis can help us grasp political leanings present in communications that are perhaps not obvious to everyone.

Understanding how intense someone’s emotions are or whether they’re being sarcastic can be crucial in supporting a person’s mental health, improving customer service, and even keeping people safe at a national level.

Get your news from actual experts, straight to your inbox. Sign up to our daily newsletter to receive all The Conversation UK’s latest coverage of news and research, from politics and business to the arts and sciences.

These are only some examples. We can imagine benefits in other areas of life, like social science research, policy-making and business. Given how important these tasks are – and how quickly conversational AI is improving – it’s essential to explore what these technologies can (and can’t) do in this regard.

Work on this issue is only just starting. Current work shows that ChatGPT has had limited success in detecting political leanings on news websites. Another study that focused on differences in sarcasm detection between different large language models – the technology behind AI chatbots such as ChatGPT – showed that some are better than others.

Finally, a study showed that LLMs can guess the emotional “valence” of words – the inherent positive or negative “feeling” associated with them. Our new study published in Scientific Reports tested whether conversational AI, inclusive of GPT-4 – a relatively recent version of ChatGPT – can read between the lines of human-written texts.

The goal was to find out how well LLMs simulate understanding of sentiment, political leaning, emotional intensity and sarcasm – thus encompassing multiple latent meanings in one study. This study evaluated the reliability, consistency and quality of seven LLMs, including GPT-4, Gemini, Llama-3.1-70B and Mixtral 8 × 7B.

We found that these LLMs are about as good as humans at analysing sentiment, political leaning, emotional intensity and sarcasm detection. The study involved 33 human subjects and assessed 100 curated items of text.

For spotting political leanings, GPT-4 was more consistent than humans. That matters in fields like journalism, political science, or public health, where inconsistent judgement can skew findings or miss patterns.

GPT-4 also proved capable of picking up on emotional intensity and especially valence. Whether a tweet was composed by someone who was mildly annoyed or deeply outraged, the AI could tell – although, someone still had to confirm if the AI was correct in its assessment. This was because AI tends to downplay emotions. Sarcasm remained a stumbling block both for humans and machines.

The study found no clear winner there – hence, using human raters doesn’t help
much with sarcasm detection.

Why does this matter? For one, AI like GPT-4 could dramatically cut the time and cost of analysing large volumes of online content. Social scientists often spend months analysing user-generated text to detect trends. GPT-4, on the other hand, opens the door to faster, more responsive research – especially important during crises, elections or public health emergencies.

Journalists and fact-checkers might also benefit. Tools powered by GPT-4 could help flag emotionally charged or politically slanted posts in real time, giving newsrooms a head start.

There are still concerns. Transparency, fairness and political leanings in AI remain issues. However, studies like this one suggest that when it comes to understanding language, machines are catching up to us fast – and may soon be valuable teammates rather than mere tools.

Although this work doesn’t claim conversational AI can replace human raters completely, it does challenge the idea that machines are hopeless at detecting nuance.

Our study’s findings do raise follow-up questions. If a user asks the same question of AI in multiple ways – perhaps by subtly rewording prompts, changing the order of information, or tweaking the amount of context provided – will the model’s underlying judgements and ratings remain consistent?

Further research should include a systematic and rigorous analysis of how stable the models’ outputs are. Ultimately, understanding and improving consistency is essential for deploying LLMs at scale, especially in high-stakes settings.

This collaboration emerged through the COST OPINION network. We extend special thanks to network members for helping out with work on this article: Ljubiša Bojić, Anela Mulahmetović Ibrišimović, and Selma Veseljević Jerković.