Toloka Team
Join us in building next-gen AI detection benchmarks with top research institutions
We are pleased to announce our collaboration with NLP researchers and industry practitioners, including teams from MIT Lincoln Laboratory, Penn State University, and the University of Oslo, to develop a new benchmark for detecting artificial text.
Why is detecting AI-generated text so important?
Detecting AI-generated text is crucial to prevent the spread of misinformation, maintain the quality of datasets used for training language models, and address ethical and legal concerns. Without detection, AI-generated content can be misrepresented as human-produced, leading to compromised data integrity and complex copyright issues. Moreover, reliance on AI-generated data can degrade the performance of models trained on such data.
Despite extensive research into detecting AI-generated content, many approaches fall short. Read our latest post to learn more about current methods of AI detection and the challenges involved.
Our approach to artificial text detector benchmarking
Our benchmarking project differs from other approaches in that we compare LLM-generated texts to human-edited versions. We ask human annotators to edit AI-generated text by checking factuality, correcting grammar and awkward phrasing, removing hallucinations, and adding personal touches to make the text engaging and relevant. The two versions constitute a datapoint in the dataset: one representing an LLM-generated text, and the other a human-written version. The text examples cover a variety of use cases, from creative writing to summarizing news articles.
The resulting dataset can be used to build and test new artificial text detectors.
Get involved
We invite you to contribute to the dataset and make a difference! To get started, click the Start button on the project page and do a trial task.
Get early access to benchmarks
The open dataset will be published in September. Get on the mailing list now to be the first to know when benchmarks are available!
Article written by:
Toloka Team
Updated:
Aug 14, 2024