Subscribe to Toloka News
Subscribe to Toloka News
Large language models are the most powerful AI solutions available. With great power comes great responsibility, and model producers rely on fine-tuning to make model output accurate and beneficial to end users. Human alignment can be a vital part of fine-tuning — a strong antidote to LLM flaws like hallucination and bias.
Toloka is building datasets of human prompts and completions for fine-tuning and evaluating LLMs. The prompts come from a diverse crowd of thousands of Tolokers, representing 69 countries. But the completions for these prompts are written and carefully reviewed by curated teams of experts with specialized skills in writing, editing, fact-checking, and domain-specific knowledge. They focus on crafting thorough responses in a collaborative workflow that promotes quality writing.
Who are the experts, and how do we leverage their diverse set of skills? To answer these questions, we talked to several AI Tutors on our writing team. We'll share their insights into the process of writing for generative AI.
Generative language models learn language almost like children do, through exposure to a vast amount of language input. These models don't need us to teach them grammar and vocabulary because they master the language during pre-training. The next step is fine-tuning, where they learn what good texts look like and gain in-depth knowledge in specific areas, more like sending a student to high school or college.
This is why we call our experts AI Tutors — they share their skills with the model to fill in the gaps, making the model's output more sophisticated and refined. As domain experts and professional writers, they offer contextual understanding and a discerning eye to improve the data used for tuning LLMs.
Feyaza, one of the editors on our AI Tutors team, sees human experts as playing the role of the playwriter directing AI. “I think even if AI were to be the original source for the canonical copy of that play or book or whatever, there's always a consultant role for humans,” she muses.
The stars of today's post are Feyaza, Luis, and Nicole, writers and editors on the AI Tutors team. We learned about their backgrounds and what it's like writing for AI.
Each expert has to take rigorous writing tests before they can accept tasks. If they pass, they complete onboarding and training to learn how and what to write for training generative AI. We provide detailed guidelines so they know exactly what makes a good response.
One of the main jobs of AI Tutors is to write responses to prompts (called prompt completions). When they open a task, they see guidelines about the voice they should write in and other parameters. For example, the task guidelines might say "Imagine you are a helpful AI assistant." The next part of the task gives them a question to answer, or sometimes an AI response to check and verify.
Writing skills are essential, but not enough to do the job right. It's like having the fuel but no car — you can't get to where you want to go. The missing element is fact-checking skills, which is why we test and train experts on their ability to track down and verify information online. You can find just about anything published on the internet, both true and untrue, and experts have to separate fact from fiction.
Once they start writing, the experts get continuous feedback from their team leads so they are improving the quality of their responses. They can also collaborate with the team when they encounter a challenging prompt and need ideas for how to approach the task.
AI writing is challenging work. Luis explains, “Speed and accuracy are key. Not only do we need to come up with a quality piece in less than an hour, but it also has to follow the rules and writing guidelines. These guidelines are in place so that the bank of texts we are creating can be a good training set for an ideal language model.”
Every single piece of writing is checked by a team lead. They make sure to follow the style guidelines and coach writers on the preferred tone of voice. Their job is to make sure that everyone understands what is good writing, and what is a little off. When the team leads have questions about difficult cases, they consult with the editor-in-chief and other leads. The team also does spot checks of the final results to make sure the overall quality is consistent.
The writers are assigned to teams of about 10 people so they can work closely together. As a team lead, Feyaza says, “One of the most challenging things is trying to be as supportive as you can for the team because everyone is in different time zones and everyone needs different things. We have people who are older, and we have people who are very young and this is their first job. So it's quite a nice mix, but you have to be available for them in every capacity that they need. You've got to really bring your A-game.”
Manual checking is the most reliable way to control the quality of unique writing tasks, but we also calculate some automated metrics to check results before we feed them to the model. For example, we check readability, and we run AlpacaEval to compare the expert answers to output from an LLM. This approach has limitations, but it gives us a quick estimation of overall quality in the context of modern LLMs.
Similar to the ChatGPT training process, our AI Tutors are trained to focus on 3 criteria in addition to language and writing style: every text should be helpful, honest, and harmless.
Helpful means that the response directly answers the question, and honest means that it's accurate and comprehensive. Tutors provide reliable sources for every answer they write, so fact-checking is built into the process. We also guide tutors to give objective responses to subjective prompts.
Harmless is a trickier category to pin down. Every response must be ethical, meaning it can't offend anyone or lead to negative consequences, either directly or indirectly. For instance, a prompt that stood out for Nicole was a task to write an email impersonating a colleague. Her response explained that the request was dishonest and unethical.
Luis shared how he learned to craft careful responses:
“I once was tasked to write a response to a prompt asking for a weight loss workout plan. What I failed to realize was the goal of the prompter was to lose a lot of weight in a very short time, which is very dangerous. Sometimes consequences are not obvious and the dangers are not always clear. It can take a lot of nuanced decision-making.”
The AI Tutor training process goes a step beyond general safety by focusing on sensitive topics like politics, medicine, financial advice, religion, gender identity, discrimination, and others. For each of these topics, AI Tutors are coached to write a balanced response and advise the reader to consult a professional. With practice and feedback, they learn to spot ethical issues and tag them as “sensitive” in the dataset.
For most of our experts, it's more than just a way to earn money remotely. They find purpose and meaning through their role in developing generative AI.
Luis: No one complained when the wheel was invented. It allowed us to become more productive and freed up time for us to tackle bigger goals and more creative pursuits. AI will have the same effect on our society. As an AI tutor, we are helping invent the wheel. In the decades to come, LLMs will become as common as the wheels on a car. This team, and people like us, are making sure that the wheel works and is as polished as it can be.
Nicole: I hope that I am improving tools for human creatives and contributing in a meaningful way that will help people. This role also helps me to enhance my writing and editing.
Feyaza: I just want to be in the AI world, you know? Just to see where this journey goes. Also, I'm learning new things every day. It's constantly refreshing my brain. I'm a writer, and what I love about this is that it hones my skills and makes them really, really sharp.
An important piece of the quality puzzle is making sure that writers are motivated and know the value of the effort they put out. We asked our experts to explain why their work is important.
Luis: The human brain is exceptionally good at nuanced decision-making, factoring in context, intention, and consequences. If somebody asks me about local politics, I adjust my delivery based on who I'm talking to (a neighbor or a tourist) and the context, like if I was asked during a job interview or over drinks. AI can only make limited calculations of these factors, but people have a sixth sense about these things.
Nicole: If the goal is to produce human-like thinking and responses, then humans should be providing the necessary information and control for that. Moreover, there are problems that are ethical in nature that can be solved by human oversight. Humans can ensure that AI learns in an unbiased manner.
Feyaza: There was a recent prompt that I thought was hilarious, and it said, 'Why do we sometimes find dolls scary?' The answer is that our brain is wired to look at a face within 14 milliseconds or something. And you sort of recognize the face and then you look for a mind, a person to connect to. And that's exactly how I feel about our AI writing tasks. Because if we do not feel connected to these prompts, then we won't believe them. And if we don't believe them, we don't trust them, and then they're not credible. So I think humans have to be part of the process.Get expert data for fine-tuning
Read more on building LLMs with Toloka:
Tech spec for your own LLM
Diversity first: how we craft creative writing prompts for fine-tuning GenAI
Toloka's new LLM Leaderboard: Finding the best model for your business