Detecting AI-generated content and why it’s vital to LLMs

Ekaterina Artemova

on July 29, 2024

Read our papers on AI training, evaluation, and safety

Learn more

Read our papers on AI training, evaluation, and safety

Learn more

Advancements in large language models offer many benefits for business and personal tasks but also introduce risks such as bias, misuse, and plagiarism. Further training with high-quality data is the main antidote to save LLMs from the majority of these problems. Given the pressure to produce data and tune models faster, AI-generated training data may seem like a time-saving solution — but it has negative consequences for all players on the GenAI market.

Ethical concerns and legal issues. AI-generated data needs to be clearly labeled. If data generated by LLMs is misleadingly sold as human-produced content, it compromises the integrity of data production companies. It also introduces a complex web of copyright and licensing issues.
Problems with dataset quality. LLMs often produce inaccurate, low-quality content that needs to be fact-checked and validated before including in a dataset.
Model collapse. Training models on generated text can degrade the quality of the final model, leading to inferior performance.

AI detection can help guard against unexpected issues caused by artificially generated content.

Challenges in detecting AI-generated text

Many companies developing GenAI applications face the problem of collecting expert data for their training. Despite high budgets for unique content written by knowledgeable experts, the quality of data can vary significantly. There is still a risk that experts are submitting AI-generated texts instead of writing their own.

The issue is widespread even at companies that specialize in automated data collection. For instance, Veselovsky et al. conducted a task on a crowdsourcing platform to summarize medical research paper abstracts. They found that 15 out of 46 summaries were likely machine-generated, raising concerns about the authenticity of crowdsourced text data.

While generated texts may be fluent, they differ from human-written texts in several ways. Compared to human writing, LLM output can be recognized by:

Limited variation in sentence length, staying in the range of 10 to 30 tokens more often than human-written text.
More limited word choices, especially noticeable with smaller LLMs.
Overuse of numbers and less use of punctuation.
Frequent emotional reactions, like acting surprised or excited.
Preference for male pronouns over female pronouns.

Research shows that the performance of the resulting model is always lower for models trained on AI-generated texts. This is a crucial point: confirming that content is human-generated is essential to the data collection process. At the same time, it’s a highly challenging task to balance the ratio between synthetic and human-written samples.

Solutions and approaches

When it comes to building a dataset, how can you be sure you’re getting what you pay for? It’s not easy to identify AI-generated text without automation tools, and detectors can be unreliable.

One solution is to collaborate with trustworthy data providers. The Toloka team has a deep understanding of the issue of artificial data, and we are actively working to address it at an industry-wide level.

Our primary goal is to ensure the quality of the data we deliver by checking for AI-generated texts and preventing any form of data manipulation. As part of this effort, we’ve conducted a comprehensive study of best practices for identifying generated content.

Approaches to artificial text detection

Artificial text detection can be grouped into two categories:

1. White-box approaches: model watermarking. In this approach, text is generated under specific constraints: the model selects words from a “green list” and avoids words from “red lists”. These lists are only known to the LM producer and shared with the artificial text detection system. For example, if a student uses a watermarked LM to write an essay, a professor can check using a specialized tool to reveal the watermark.

While watermarking is reliable, its adoption depends on universal agreement among LM producers. It also affects text fluency, and edited texts might lose the watermark.

2. Black-box approaches can be categorized into two major groups according to which machine learning methods they are based on. The core idea behind black-box approaches is that they distinguish between human-generated texts and machine-generated texts without access to actual text generation model weights.

Learnable detectors. The most basic setup is to train a binary classifier to distinguish between human-written and model-generated text. This requires two sources of training data: AI-generated text and human-written text. Such classifiers can achieve around 95% accuracy (TuringBench, M4) but are prone to errors from domain or style shifts and may struggle with texts from unfamiliar models. With a more advanced setup, you can build a classifier that aims to recognize the source text generation model. Other setup approaches detect the change point between human-written and AI-generated texts, e.g. between the prompt entered by a user and its completion generated by a language model (SemEval 2023). As of now, learnable detectors are the state-of-the-art solutions as they reach the highest level of accuracy and F1 score on most standard benchmarks.

Metrics-based detectors. These detectors support zero-shot setup, in which no training data is needed. The decision relies on heuristics such as a model's tendency to assign a higher probability to its own texts or distributional information. There are multiple advantages of metrics-based detectors. First, the zero-shot setup eliminates the need for time-consuming and resource-greedy training data collection. Second, they might generalize better to unseen text generation models and unseen domains, thus lifting the main limitation of trainable detectors. Finally, zero-shot methods are easier to implement and use out of the box, as they don’t require training and can be deployed immediately.

MGTBench is an open-source implementation of different methods for detecting LLM-generated texts and can be used as a starting point for your own experiments.

Both groups of methods should be calibrated according to their performance on held-out datasets generated by a diverse set of text generation models. This leads to a vicious cycle in which the text generation models improve and the performance of current detectors starts to suffer, to the extent that production systems need to be re-trained from scratch. At the same time, text generation models might be trained with feedback from detectors to make generated text more human-like to fool the detectors.

Challenges in detection:

Mixed texts: It is more challenging to detect cases where texts are generated by models but polished by human editors.
Detection errors: Detectors sometimes flag non-native writing as AI-generated, or they are fooled by slight modifications to generated texts.

Example: A detector might score machine-generated text very high but significantly lower the score if the text is human-edited.

Toloka’s role in artificial text detection

At Toloka, we use a combination of tools to increase the reliability of AI detection. Our content writers and domain experts are highly skilled and vetted through a rigorous selection process. To help prevent AI use from the start, we provide guidelines and training for experts and implement antifraud mechanisms.

Along with preventing fraud and catching human mistakes, our quality control pipelines focus on detecting AI-generated text as a separate aspect of quality. We take a multi-layered approach to quality, with a combination of strategies like overlap (having more than one expert do the same task), automated quality checks, comprehensive review by experts, and audits of the data. If a text gets flagged as AI-generated, it’s sent back for a full rewrite.

We currently use public AI detectors while we develop our own benchmarks for better reliability. We plan to make these benchmarks publicly available, because we believe in the value of responsible development of generative AI for all people.

The broader demand for AI detection

The world has reacted with enthusiasm to the mass availability of generative AI, but the phenomenon has introduced new ethical dilemmas around digital content. As LLM use has become ubiquitous, AI detection is now a hot topic in education, research, the workplace, and social media.

Research communities and educational institutions are developing new standards around use of AI, with guidelines for appropriate ways to use AI and add disclaimers to generated text. Examples include policies for authors established by Elsevier and the ACL (the Association for Computational Linguistics).

As social media platforms assume greater responsibility for the well-being of their users, they are also pursuing AI detection methods to help inform users when posted content is generated by AI. For instance, Meta has begun labeling AI-generated content on its platforms. Similarly, platforms with a general audience like Airbnb or Tinder try to flag AI-generated profiles to protect their users.

Over the past two years, mainstream media has continued an ongoing discussion of ethical use of ChatGPT and other LLMs. The possibility of being deceived by AI-generated content is top of mind for anyone interacting with the digital world. Society has changed in response to LLM advancements, and the current capabilities of artificial text detectors are not sophisticated enough to keep up.

What to expect next

Building an artificial text detector is an ongoing challenge. New text generation models are released frequently, and the true authorship of online text is increasingly uncertain. Current state-of-the-art techniques suffer from domain shifts and are not robust across different domains.

The odds are tough, but the choice is obvious. We will continue to innovate and improve AI detection methods because it’s at the heart of responsible AI development.

We’ll keep you updated as new benchmarks become available. Subscribe to our news or join the conversation on social media.

References:

Kirchenbauer, John, et al. "A watermark for large language models." International Conference on Machine Learning, PMLR, 2023.
Uchendu, Adaku, et al. "Authorship attribution for neural text generation." Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), 2020.
Muñoz-Ortiz, Alberto, et al. "Contrasting linguistic patterns in human and LLM-generated text." arXiv preprint arXiv:2308.09067, 2023.
Stanford HAI
ABC News

Subscribe to Toloka news

Case studies, product news, and other articles straight to your inbox.