Solutions

Datasets

Research

Resources

Company

Talk to us

Toloka welcomes new investors Bezos Expeditions and Mikhail Parakhin in strategic funding round

Learn more

Toloka welcomes new investors Bezos Expeditions and Mikhail Parakhin in strategic funding round

Learn more

Toloka welcomes new investors Bezos Expeditions and Mikhail Parakhin in strategic funding round

Learn more

Tech spec for your own LLM

Ivan Yamshchikov

October 10, 2023

Insights

In the fast-evolving landscape of AI, Large Language Models (LLMs) stand out as the most rapidly progressing subfield, offering transformative potential for business applications. Just over the past month, we've seen a flurry of activity in this space. New tech startups are sprouting up, and established players are unveiling their LLM offerings. For instance, EY recently launched its EY.AI platform, marking their confidence in this tech with a staggering $1.4 billion investment. While the market heats up, it's still up for debate whether supply is outpacing demand.

From my point of view, there remains a strong case for developing a custom LLM tailored to the distinct needs of your business. The case is even stronger if your company has a considerable amount of data that you can leverage, or if your data science team has tested generic solutions for your business cases and you see an open opportunity in the market.

Recently, we at Toloka AI embarked on an exciting journey: crafting a new LLM from the ground up. We are collaborating with another AI startup as a data partner responsible for data collection, refinement, evaluation and continuous monitoring. Through this process, we've gained invaluable insights to share with you. And trust me, there's more on the horizon.

This post kicks off a series delving into the nuts and bolts of building an LLM from scratch, but today we want to present the big picture: where do you begin? Before a single line of code is written or a single data point is collected, it's imperative to outline your vision for the model. Let's dive into the first step: crafting the tech specification for your LLM.

Project scope and objectives

It's crucial to identify the specific challenges your model is designed to address. Currently, expertise in particular languages is a notable differentiation. But suppose you're tailoring your LLM for specialized tasks, such as legal document processing or managerial decision support. In that case, this will influence your data sourcing and labeling strategies. It’s wise to define your primary market niche early on: What niche will your LLM serve? What areas will it excel in? What unique value proposition are you aiming to offer?

While the adaptability of LLMs is commendable, it can also be a double-edged sword. The market is already saturated with general-purpose LLMs. Thus, carving out specific business scenarios where your model can outperform competitors becomes an essential first move.

Data acquisition and management

Data serves as the foundation of any LLM. It's vital to ensure that your data sources align with your objectives and come from a mix of diverse origins. Beyond mere collection, data often demands rigorous cleaning and preprocessing to meet quality standards. In today's digital landscape, prioritizing data privacy and ethical standards isn't just commendable—it's non-negotiable.

Our collaboration with HuggingFace on the BigCode project underscored a fundamental truth: the quality of your data profoundly impacts your model's performance. If you possess proprietary data, list all available resources. Seek out open data that aligns with your earlier defined business scenarios. And if there's an opportunity to enhance your data with human insights, seize it. Your diligence will pay dividends in the final outcomes.

Safety and bias mitigation

AI, for all its transformative potential, has its challenges. Biased outputs from LLMs can not only distort decisions but also cause serious damage and tarnish reputations. It's essential to implement robust measures to identify biases and introduce safeguards against undesired outputs, ensuring that the LLM remains compliant and trustworthy.

As we build our language model, we constantly face the challenge of detecting and mitigating various biases. If you want to dig deeper into this topic, see the recent webinar by Toloka developer advocate Prof. Ujwal Gadiraju. Cultural nuances heavily influence design choices. Here, human perspective is indispensable. In upcoming posts, we'll explore how diverse "Tolokers" identify distinct biases that must be addressed before deployment. While enlisting ethicists provides valuable insights, it's no silver bullet. A blended approach of crowdsourcing that taps into varied cultural and educational backgrounds, paired with expert guidance, offers the most holistic assurance of quality.

Evaluation metrics and benchmarks

To effectively assess your LLM's performance, it's essential to have well-defined metrics and benchmarks. Whatever evaluation methodology you prefer, be it evaluation on some open or closed benchmark, some data labelling pipeline or evaluation with a stand-alone model, choose one and stick to it. This decision gives you clarity on your LLM capabilities. Additionally, comparing your LLM with existing solutions provides insights into its market standing.

Evaluation remains a focal point in LLM research, with many initiatives underway to refine and bolster assessment methods. For those keen on this subject, stay tuned. We're on the cusp of presenting Toloka’s innovative approach to LLM evaluation methodology. You also use evaluation to compare your LLM with existing solutions and gain insight into its market standing.

Monitoring and maintenance

Post-deployment, the journey with your LLM is just beginning. Constant performance monitoring reveals areas for refinement. Additionally, as data landscapes shift and business demands evolve, scheduled model updates and retraining become vital.

For context, a recent study from Stanford and Berkeley observed temporal shifts in ChatGPT's performance. Although ChatGPT is proprietary, prompting speculation rather than concrete conclusions, it's reasonable to hypothesize that changes might be linked to human feedback loops. While user feedback is invaluable for refining models, without vigilant oversight, it can lead to unforeseen challenges. If you don’t trust me, read this cautionary tale on how unchecked feedback made a chatbot produce inappropriate content in under 24 hours.

Conclusion and future directions

Embarking on the path to craft a custom LLM is both ambitious and rewarding, with the capacity to significantly elevate your business operations. But before delving into such a substantial investment, consider the following pivotal questions:

— What's your niche? Where will your LLM fit within the market landscape? — How will you source and refine your data? Can you infuse human insights and secure datasets that lend your model a competitive edge? — What pitfalls must your model avoid? How can you ensure comprehensive coverage of potential undesired outcomes and mitigate risks? — Which evaluation methods will best measure your LLM's efficacy? — What strategies will underpin ongoing maintenance and optimize the feedback loop?

By addressing these questions, you're setting the foundation for an LLM that's both innovative and grounded in strategic foresight.

Good luck! If you already have some answers, reach out to discuss — and consider Toloka as a data partner for your shiny new LLM.

Subscribe to Toloka News

Case studies, product news, and other articles straight to your inbox.

Subscribe to Toloka News

Case studies, product news, and other articles straight to your inbox.

Subscribe to Toloka News

Case studies, product news, and other articles straight to your inbox.

Recent articles

View all articles

An Annotator's Perspective: Building a Dataset to Challenge LLM Evaluation

Aug 11, 2025

Creating domain-ready datasets: How Toloka's hybrid approach generates realistic and high-quality data

Aug 4, 2025

Image annotation tools: how to label data that actually teaches AI

Jul 30, 2025

An Annotator's Perspective: Building a Dataset to Challenge LLM Evaluation

Aug 11, 2025

Creating domain-ready datasets: How Toloka's hybrid approach generates realistic and high-quality data

Aug 4, 2025

Image annotation tools: how to label data that actually teaches AI

Jul 30, 2025

Toloka Podcast: Agentic AI & the Future of Coding

Jul 29, 2025

More about Toloka

What is Toloka’s mission?

Where is Toloka located?

What is Toloka’s key area of expertise?

How long has Toloka been in the AI market?

How does Toloka ensure the quality and accuracy of the data collected?

How does Toloka source and manage its experts and AI tutors?

What types of projects or tasks does Toloka typically handle?

What industries and use cases does Toloka focus on?

What is Toloka’s mission?

Where is Toloka located?

What is Toloka’s key area of expertise?

How long has Toloka been in the AI market?

How does Toloka ensure the quality and accuracy of the data collected?

How does Toloka source and manage its experts and AI tutors?

What types of projects or tasks does Toloka typically handle?

What industries and use cases does Toloka focus on?

What is Toloka’s mission?

Where is Toloka located?

What is Toloka’s key area of expertise?

How long has Toloka been in the AI market?

How does Toloka ensure the quality and accuracy of the data collected?

How does Toloka source and manage its experts and AI tutors?

What types of projects or tasks does Toloka typically handle?

What industries and use cases does Toloka focus on?