Are you using GenAI to write for you? Meet the people writing for GenAI

Toloka Team
by Toloka Team

Subscribe to Toloka News

Subscribe to Toloka News

Large language models are the most powerful AI solutions available. With great power comes great responsibility, and model producers rely on fine-tuning to make model output accurate and beneficial to end users. Human alignment can be a vital part of fine-tuning — a strong antidote to LLM flaws like hallucination and bias.

Toloka is building datasets of human prompts and completions for fine-tuning and evaluating LLMs. The prompts come from a diverse crowd of thousands of Tolokers, representing 69 countries. But the completions for these prompts are written and carefully reviewed by curated teams of experts with specialized skills in writing, editing, fact-checking, and domain-specific knowledge. They focus on crafting thorough responses in a collaborative workflow that promotes quality writing.

Who are the experts, and how do we leverage their diverse set of skills? To answer these questions, we talked to several AI Tutors on our writing team. We'll share their insights into the process of writing for generative AI.

Empower your GenAI development

Get your expert data for Fine-tuning, RLHF and Evaluation. High-quality, for any domain, at scale.
Talk to us

Why LLMs need writing experts

Generative language models learn language almost like children do, through exposure to a vast amount of language input. These models don't need us to teach them grammar and vocabulary because they master the language during pre-training. The next step is fine-tuning, where they learn what good texts look like and gain in-depth knowledge in specific areas, more like sending a student to high school or college.

This is why we call our experts AI Tutors — they share their skills with the model to fill in the gaps, making the model's output more sophisticated and refined. As domain experts and professional writers, they offer contextual understanding and a discerning eye to improve the data used for tuning LLMs.

Feyaza, one of the editors on our AI Tutors team, sees human experts as playing the role of the playwriter directing AI. “I think even if AI were to be the original source for the canonical copy of that play or book or whatever, there's always a consultant role for humans,” she muses.

Up close and personal: meet 3 of our experts

The stars of today's post are Feyaza, Luis, and Nicole, writers and editors on the AI Tutors team. We learned about their backgrounds and what it's like writing for AI.

Electrical Engineering StudentLuis is an Electrical Engineering student who wants to work in the AI industry someday, and he appreciates the chance to get an insider view on how large language models are trained. He believes his background in science and research helps him explain ideas clearly in his writing. “My current studies help me to write with the reader in mind. This mindset is very important as an AI tutor since we want language models to also use a clear voice and an eloquent but understandable tone.”
Educator, EditorNicole is an experienced educator sharpening her skills as she transitions into an editing career. She says writing for AI feels “relevant and modern” and she enjoys working “behind the scenes.”
Journalist, Social Media ConsultantFeyaza is a journalist, social media consultant, and podcast host with plenty of enthusiasm for AI. “I love the idea of AI, and I think that's where we're all moving. And I think that anyone who doesn't want to work in AI is crazy. People are afraid that AI is going to steal their jobs. But if you can't beat something, then join it.”

Getting experts on board

Each expert has to take rigorous writing tests before they can accept tasks. If they pass, they complete onboarding and training to learn how and what to write for training generative AI. We provide detailed guidelines so they know exactly what makes a good response.

One of the main jobs of AI Tutors is to write responses to prompts (called prompt completions). When they open a task, they see guidelines about the voice they should write in and other parameters. For example, the task guidelines might say "Imagine you are a helpful AI assistant." The next part of the task gives them a question to answer, or sometimes an AI response to check and verify.

Classification task
Summarization task
Paraphrasing task

Writing skills are essential, but not enough to do the job right. It's like having the fuel but no car — you can't get to where you want to go. The missing element is fact-checking skills, which is why we test and train experts on their ability to track down and verify information online. You can find just about anything published on the internet, both true and untrue, and experts have to separate fact from fiction.

Once they start writing, the experts get continuous feedback from their team leads so they are improving the quality of their responses. They can also collaborate with the team when they encounter a challenging prompt and need ideas for how to approach the task.

How the writing team guarantees quality

AI writing is challenging work. Luis explains, “Speed and accuracy are key. Not only do we need to come up with a quality piece in less than an hour, but it also has to follow the rules and writing guidelines. These guidelines are in place so that the bank of texts we are creating can be a good training set for an ideal language model.”

Every single piece of writing is checked by a team lead. They make sure to follow the style guidelines and coach writers on the preferred tone of voice. Their job is to make sure that everyone understands what is good writing, and what is a little off. When the team leads have questions about difficult cases, they consult with the editor-in-chief and other leads. The team also does spot checks of the final results to make sure the overall quality is consistent.

The writers are assigned to teams of about 10 people so they can work closely together. As a team lead, Feyaza says, “One of the most challenging things is trying to be as supportive as you can for the team because everyone is in different time zones and everyone needs different things. We have people who are older, and we have people who are very young and this is their first job. So it's quite a nice mix, but you have to be available for them in every capacity that they need. You've got to really bring your A-game.”

Manual checking is the most reliable way to control the quality of unique writing tasks, but we also calculate some automated metrics to check results before we feed them to the model. For example, we check readability, and we run AlpacaEval to compare the expert answers to output from an LLM. This approach has limitations, but it gives us a quick estimation of overall quality in the context of modern LLMs.

Good texts are ethical texts

Similar to the ChatGPT training process, our AI Tutors are trained to focus on 3 criteria in addition to language and writing style: every text should be helpful, honest, and harmless.

Helpful means that the response directly answers the question, and honest means that it's accurate and comprehensive. Tutors provide reliable sources for every answer they write, so fact-checking is built into the process. We also guide tutors to give objective responses to subjective prompts.

Harmless is a trickier category to pin down. Every response must be ethical, meaning it can't offend anyone or lead to negative consequences, either directly or indirectly. For instance, a prompt that stood out for Nicole was a task to write an email impersonating a colleague. Her response explained that the request was dishonest and unethical.

Luis shared how he learned to craft careful responses:
“I once was tasked to write a response to a prompt asking for a weight loss workout plan. What I failed to realize was the goal of the prompter was to lose a lot of weight in a very short time, which is very dangerous. Sometimes consequences are not obvious and the dangers are not always clear. It can take a lot of nuanced decision-making.”

The AI Tutor training process goes a step beyond general safety by focusing on sensitive topics like politics, medicine, financial advice, religion, gender identity, discrimination, and others. For each of these topics, AI Tutors are coached to write a balanced response and advise the reader to consult a professional. With practice and feedback, they learn to spot ethical issues and tag them as “sensitive” in the dataset.

What makes AI writers tick?

For most of our experts, it's more than just a way to earn money remotely. They find purpose and meaning through their role in developing generative AI.

Luis: No one complained when the wheel was invented. It allowed us to become more productive and freed up time for us to tackle bigger goals and more creative pursuits. AI will have the same effect on our society. As an AI tutor, we are helping invent the wheel. In the decades to come, LLMs will become as common as the wheels on a car. This team, and people like us, are making sure that the wheel works and is as polished as it can be.

Nicole: I hope that I am improving tools for human creatives and contributing in a meaningful way that will help people. This role also helps me to enhance my writing and editing.

Feyaza: I just want to be in the AI world, you know? Just to see where this journey goes. Also, I'm learning new things every day. It's constantly refreshing my brain. I'm a writer, and what I love about this is that it hones my skills and makes them really, really sharp.

Why is human input so valuable for fine-tuning LLMs?

An important piece of the quality puzzle is making sure that writers are motivated and know the value of the effort they put out. We asked our experts to explain why their work is important.

Luis: The human brain is exceptionally good at nuanced decision-making, factoring in context, intention, and consequences. If somebody asks me about local politics, I adjust my delivery based on who I'm talking to (a neighbor or a tourist) and the context, like if I was asked during a job interview or over drinks. AI can only make limited calculations of these factors, but people have a sixth sense about these things.

Nicole: If the goal is to produce human-like thinking and responses, then humans should be providing the necessary information and control for that. Moreover, there are problems that are ethical in nature that can be solved by human oversight. Humans can ensure that AI learns in an unbiased manner.

Feyaza: There was a recent prompt that I thought was hilarious, and it said, 'Why do we sometimes find dolls scary?' The answer is that our brain is wired to look at a face within 14 milliseconds or something. And you sort of recognize the face and then you look for a mind, a person to connect to. And that's exactly how I feel about our AI writing tasks. Because if we do not feel connected to these prompts, then we won't believe them. And if we don't believe them, we don't trust them, and then they're not credible. So I think humans have to be part of the process.

Get expert data for fine-tuning

Read more on building LLMs with Toloka:
Tech spec for your own LLM
Diversity first: how we craft creative writing prompts for fine-tuning GenAI
Toloka's new LLM Leaderboard: Finding the best model for your business

Article written by:
Toloka Team
Toloka Team

Recent articles

Have a data labeling project?

Take advantage of Toloka technologies. Chat with our expert to learn how to get reliable training data for machine learning at any scale.

More about Toloka

  • Our mission is to empower businesses with high quality data to develop AI products that are safe, responsible and trustworthy.
  • Toloka is a European company. Our global headquarters is located in Amsterdam. In addition to the Netherlands, Toloka has offices in the US, Israel, Switzerland, and Serbia. We provide data for Generative AI development.
  • We are the trusted data partner for all stages of AI development–from training to evaluation. Toloka has over a decade of experience supporting clients with its unique methodology and optimal combination of machine learning technology and human expertise. Toloka offers high quality expert data for training models at scale.
  • The Toloka team has supported clients with high-quality data and exceptional service for over 10 years.
  • Toloka ensures the quality and accuracy of collected data through rigorous quality assurance measures–including multiple checks and verifications–to provide our clients with data that is reliable and accurate. Our unique quality control methodology includes built-in post-verification, dynamic overlaps, cross-validation, and golden sets.
  • Toloka has developed a state-of-the-art technology platform for data labeling and has over 10 years of managing human efforts, ensuring operational excellence at scale. Now, Toloka collaborates with data workers from 100+ countries speaking 40+ languages across 20+ knowledge domains and 120+ subdomains.
  • Toloka provides high-quality data for each stage of large language model (LLM) and generative AI (GenAI) development as a managed service. We offer data for fine-tuning, RLHF, and evaluation. Toloka handles a diverse range of projects and tasks of any data type—text, image, audio, and video—showcasing our versatility and ability to cater to various client needs.
  • Toloka addresses ML training data production needs for companies of various sizes and industries– from big tech giants to startups. Our experts cover over 20 knowledge domains and 120 subdomains, enabling us to serve every industry, including complex fields such as medicine and law. Many successful projects have demonstrated Toloka's expertise in delivering high-quality data to clients. Learn more about the use cases we feature on our customer case studies page.