Products

Resources

Impact on AI

Company

Test your LLM's math skills with our benchmark for complex problems and step-by-step reasoning

Test your LLM's math skills with our benchmark for complex problems and step-by-step reasoning

Test your LLM's math skills with our benchmark for complex problems and step-by-step reasoning

Evgeniya Sukhodolskaya
Evgeniya Sukhodolskaya

Evgeniya Sukhodolskaya

Sep 27, 2023

Sep 27, 2023

Insights

Insights

Searching for a Prompt Engineer with 10+ years of experience?

Searching for a Prompt Engineer with 10+ years of experience?
Searching for a Prompt Engineer with 10+ years of experience?

Prompt engineering has been around for a while, but the term has gone viral with the recent boom in large language models (LLMs). The main driver of the phenomenon is that zero-shot and few-shot learning work well only with language models of a very large size.

The hype around prompt engineering was inevitable — after all, prompt wording can have a dramatic effect on the quality of LLM output (LLMs exhibit large variance over different prompt templates). It is also an effective way to downstream language models without retraining any model parameters or gathering data required for traditional fine-tuning.

It is also an effective way to downstream language models without retraining any model parameters or gathering data required for traditional fine-tuning

The research literature suggests a variety of promising prompt engineering techniques. But we have yet to discover a universal “find the best prompt for a certain task” algorithm suitable for every LLM. We have to experiment with each specific model and task to find a prompt that produces the desired output. That’s why the “Prompt Engineer” job became a thing.

Prompt Engineers

Prompt Engineers

As it happens with every hype position in IT (speaking from my personal work experience as a Developer Advocate), there is a lot of confusion for both employers and employees about what a Prompt Engineer has to know. The confusion is made worse by the common perception that prompt engineering is a process of reformulating questions to an oracle until you get a desired answer, something like a patient parent doing math homework with their kid. On the contrary, prompt engineering is a very deliberate procedure. Even more importantly, “prompt engineering is not just about designing and developing prompts. It encompasses a wide range of skills and techniques that are useful for interacting and developing with LLMs”.

There are plenty of helpful guides, courses and articles with techniques and best practices of prompt engineering which I used myself for getting the hang of it. And in most of them, based on my experience, you will see the same recommendation: good prompts are born out of experimentation, so you need to establish your own process of designing a prompt and develop a sixth sense for prompt engineering. Obviously, some of the intuition can come from working experience as a Machine Learning Engineer, which helps develop a deeper understanding of how LLMs are built and trained. But there are some less obvious professions that can naturally develop strong intuition in prompt engineering. One of them is a Crowd Solution Architect (CSA).

Crowd Solution Architects

Crowd Solution Architects

It might seem like I’m replacing one trendy, misunderstood IT job title with another one that is just as baffling and newfangled. Well, it’s actually not that new — Crowdsourcing Solution Architects have been around since the emergence of crowdsourcing platforms, so CSAs can easily have 15+ years of experience. The industry already has a solid understanding of the professional scope of a CSA.

So, what are their day-to-day tasks, and why can their experience be helpful for developing intuition in prompt engineering? To put it simply, CSAs design data labeling tasks for a crowd (expert or non-expert), converting initial specifications to a format that is easy for labelers to understand. This includes writing detailed instructions, creating a user-friendly labeling interface, setting up a fair pricing scheme and quality-control mechanisms, and, most importantly, applying sophisticated decomposition techniques to an initial problem. Decomposition is an important skill that turns a hard task into a set of easier subtasks. An example is to transform a ranking problem into a side-by-side labeling task followed by noisy Bradley-Terry aggregation. The subtasks are solved by the crowd and then CSAs apply various aggregation techniques to get the final answer to the problem.

The subtasks are solved by the crowd and then CSAs apply various aggregation techniques to get the final answer to the problem

Why does designing crowdsourcing tasks help develop intuition in prompt engineering

Why does designing crowdsourcing tasks help develop intuition in prompt engineering

While experimenting with prompt engineering, I noticed similarities between best practices in designing crowdsourcing tasks and best practices in prompt engineering. This led me to develop and test several hypotheses on how CSA methods could be used for improving prompt results. I strongly believe that new prompting techniques may be discovered by applying knowledge from adjacent areas of research, such as crowdsourcing.

Decomposition + Aggregation

Techniques like chain-of-thought prompting and least-to-most prompting demonstrate that a model performs better if a task is divided into subtasks or steps. Decomposition works the same way for crowdsourcing tasks by breaking them down into smaller problems that are much easier to solve. It is intuitively easy to understand why such a technique helps to increase human labeling quality — it makes it easier to concentrate and requires less expertise from labelers. For LLMs it might work well because the subtasks might occur more frequently in LLM training datasets scraped from the web.

CSAs have learned in practice how to decompose tasks for search relevance estimation, 3D transport labeling, ad moderation, personal information recognition in code, and many other use cases. They have developed a strong intuition for what is hard and what is easy to solve with a crowd. This intuition can be partially captured in a set of best practices. For example, there is a rule of thumb that the number of classes in classification tasks shouldn’t exceed 5 or 6, otherwise the choice overload bias comes into play. Another rule is that class elimination one-by-one from simple to complex will provide better results than labeling classes at the same time. Can these rules be applied for labeling with LLMs? It seems so! We applied the latter rule to labeling with GPT-4 and the results showed 20% higher accuracy.

When there is a decomposition, there is also an aggregation. For example, when decomposing ranking tasks to side-by-side labeling, one needs to aggregate them with the Bradley-Terry (or any other suitable) model. The closest thing related to aggregation in prompt engineering is, from my perspective, consistent chain-of-thought prompting: it is based on a Majority Vote Aggregation algorithm. It seems like it might be a very interesting research problem to test more sophisticated aggregation techniques in labeling with LLMs.

Instructions

One of the keys to getting good results with crowdsourcing is to provide clear and detailed instructions (basically, a detailed prompt for humans). Since LLMs are incapable of reasoning, giving them the perfect instructions for people might not lead to the best output quality, as shown in the article The Turking Test: Can Language Models Understand Instructions?.

However, some best practices for prompt engineering and instruction design for crowdsourcing match one-to-one, proving that a skilled CSA could make a promising prompt engineer. Here are some practices that apply to prompts and instructions:

  • Short does not equal good. Be as clear as possible.

  • Since LLMs are few shot learners like we are, examples matter a lot.

  • If there are multiple possible classes in the task, provide enough examples (at least 2–3) to illustrate each of them.

  • Use real-life (not synthetic) examples for your few-shots.

  • If you do add rare cases to the task, make sure to explain them well in the instructions.

Key takeaways

  • There is a promising unexplored research field for applying CSA techniques to prompt engineering.

  • Intuition in prompt engineering certainly could be gained through other work experience.

  • At Toloka, we successfully use our CSAs as prompt engineers. If you happen to have someone on your team who worked with crowdsourcing, you don’t have to search for a prompt engineer!

Article written by:

Evgeniya Sukhodolskaya
Evgeniya Sukhodolskaya

Evgeniya Sukhodolskaya

Updated:

Sep 27, 2023

Subscribe to Toloka News

Case studies, product news, and other articles straight to your inbox.

Subscribe

Subscribe
to Toloka News

Case studies, product news, and other articles straight to your inbox.

Subscribe

Subscribe to Toloka News

Case studies, product news, and other articles straight to your inbox.

Subscribe

More about Toloka

What is Toloka’s mission?

Where is Toloka located?

What is Toloka’s key area of expertise?

How long has Toloka been in the AI market?

How does Toloka ensure the quality and accuracy of the data collected?

How does Toloka source and manage its experts and AI tutors?

What types of projects or tasks does Toloka typically handle?

What industries and use cases does Toloka focus on?

What is Toloka’s mission?

Where is Toloka located?

What is Toloka’s key area of expertise?

How long has Toloka been in the AI market?

How does Toloka ensure the quality and accuracy of the data collected?

How does Toloka source and manage its experts and AI tutors?

What types of projects or tasks does Toloka typically handle?

What industries and use cases does Toloka focus on?

What is Toloka’s mission?

Where is Toloka located?

What is Toloka’s key area of expertise?

How long has Toloka been in the AI market?

How does Toloka ensure the quality and accuracy of the data collected?

How does Toloka source and manage its experts and AI tutors?

What types of projects or tasks does Toloka typically handle?

What industries and use cases does Toloka focus on?