Toloka Team
Happy experts, better models: Ethical data collection for supervised fine-tuning
One year ago, we introduced AI Tutors in our blog post Closing the gap: How Toloka’s AI Tutors align LLMs for better results. Since then, we’ve developed a separate platform dedicated exclusively to AI Tutors, called Mindrift, and expanded the team of AI Tutors with new experts in more languages and over 20 knowledge domains.
AI Tutors are experienced professionals hired on a freelance basis to contribute expertise in writing, editing, and specialized domains such as health sciences, law, STEM, coding, humanities, and more. They are assigned to teams according to their roles: Writers, Editors, and Domain Experts. Together, they collaborate on projects to craft polished prompts and completions for supervised fine-tuning (SFT), producing both single-turn and multi-turn dialogs of varying complexity. With strong domain knowledge, they also add context to the prompts and run fact checks as needed.
Expert contributions translate to better LLM output
AI Tutors are highly educated professionals who build on experience in their fields when creating content for datasets.
96% have a post-secondary degree
47% hold a Master’s degree or higher
1 in 7 has a PhD
Our writers and editors are native speakers with high language accuracy who know how to craft well-written, natural texts. To ensure the data meets the specific needs of each project, all experts follow style guidelines and other requirements defined by the client.
The vast majority of AI Tutors have full-time or part-time jobs in their field of expertise, and they contribute content on Mindrift in their free time to supplement their main income. Some are balancing several freelance or consulting roles. Some have special circumstances, such as full-time parents, and they welcome the opportunity to flex their professional skills without a set time commitment. Each expert has their own story, and many are featured in the Mindrift blog and the podcast.
We’ve found that having a wide variety of contributors with all sorts of real-life experiences helps keep datasets diverse and authentic. We embrace full flexibility for AI Tutors because it makes projects scalable and gives highly qualified professionals more opportunities to use their skills. It comes down to our core values: what’s good for the workforce is good for the AI industry overall.
Diverse backgrounds for unique, unbiased datasets
We hire experts all over the world — 1 in 3 are based in Europe, but the AI Tutors team spans 5 continents. The diversity of the workforce infuses data with a range of cultural context, which helps prevent bias from creeping into the data.
While experts have the freedom to work wherever they want, only 9 percent of our domain experts are active on similar or competing platforms to generate AI training data. The other 91 percent of domain experts are exclusive to our platform, contributing domain-specific content that is authentic and unique.
In other words, we can guarantee that data in our custom SFT datasets was not used for training any other models. This complements the overall language quality and expertise inherent in the data, which is reflected in better LLM performance after alignment.
Ethical data production is about happy experts
We design our data production pipelines with three aspects in mind: data quality, scalability, and user experience. While we have metrics for data quality and scalability, we measure user experience by direct feedback. We regularly hear from AI Tutors and respond to their requests and suggestions to improve workflows.
To get the big picture on user satisfaction, we recently ran a survey and received 530 responses from the AI Tutors team. The results were overwhelmingly positive, but we also gained insight on where to focus our efforts to improve the platform and processes.
Overall, AI Tutors give the Mindrift platform 4.3 out of 5 stars. Domain experts are the happiest in the group, with a 4.5 rating for their experience on paid projects.
A strong Net Promoter Score (NPS) of 76.47 among domain experts confirms their enthusiasm about creating data at Mindrift. NPS is a metric for measuring user satisfaction in the range of -100 to +100, based on the question “How likely would you be to recommend the platform to your friends or colleagues?” Anything over 60 is generally interpreted as a strongly positive score.
The enthusiasm of AI Tutors extends to their outlook on the future. Our report on AI and the Workforce from May 2024 highlights that 95 percent of AI Tutors are optimistic about AI’s potential to create new jobs. As they adapt to new opportunities created by the AI boom, they can see the impact of their efforts and anticipate increasing demand for their skills.
How we keep experts engaged
The Mindrift platform is fairly new, and it’s growing fast. Many AI Tutors say they enjoy the mental challenge, and 1 in 4 say they joined Mindrift to apply their skills in a new way.
According to our survey, here’s what experts like best:
Flexibility. They work whenever and wherever they want. There is no time commitment, volume quota, or pressure to perform. To keep things interesting, experts choose the tasks they work on and can vary the topics they write about.
Fair compensation. Pay is based on average earnings for their location and level of expertise. They earn performance bonuses for high-quality work and for completing a steady volume of tasks when demand is high.
Positive collaboration. Team members guide and mentor each other, and QA specialists give individual feedback to help AI Tutors improve. When questions and issues arise, the group leaders and Support team are there to help. The atmosphere is friendly and lighthearted, aiming to support the intense pace of projects without putting stress on the experts.
Easy-to-use platform. Task interfaces and instructions are intuitive and designed with our experts in mind. Built-in co-pilot tools reduce routine work by 45 percent to speed up team efforts and prevent errors.
Here’s what AI Tutors are saying:
“The flexibility is my overall favorite thing. Both in time of working and the ways to make each task different and interesting.”
“The methods of communication and interaction really make it feel like a team.”
“I like the challenges that the pace and requirements add. I also like being able to bend my workday and workweek as needed to fit my life. The bonuses, pay that flexes with complexity and deadlines, and per-item pay scale gamify things a little bit.”
User happiness matters — beyond retaining a workforce. It’s about supporting a network of knowledgeable professionals who are motivated to add real value to datasets. It’s a critical aspect of responsible AI development.
Trust Toloka’s experts for high-value SFT datasets to align your LLM
Article written by:
Toloka Team
Updated:
Aug 11, 2024