“It helps me learn and earn”: Toloka reports results of a global survey of Tolokers in 2022

Toloka Team
by Toloka Team
Image

Subscribe to Toloka News

Subscribe to Toloka News

Why diverse and global crowd makes for better AI

The world of machine learning is realizing the need for data collected from the widest possible demographic. Otherwise, the algorithms learn to mimic the biases, prejudices and culturally-specific understandings of a narrow group of people. Some businesses have even built products that work differently based on gender and skin color, compromising both the ethics and the effectiveness of AI.

Alternatively, the algorithms that get this right work in more places, deliver more value and serve a global market. Whether it's for data collection, labeling, or building human-in-the-loop pipelines, more and more engineers are coming to understand the need for a truly universal human perspective. That's why they want training data labeled by the widest possible mix of genders, nationalities, religions, social and working backgrounds.

That's why they want training data labeled by the most diverse crowd available.

That's why they want training data labeled by Toloka.

Let's take a closer look at the crowd behind it.

Tolokers in 2022

As of March 2022, Toloka has 245,000 monthly active performers worldwide, which is 20% more than last year. Of the world's 217 countries, Toloka has active performers in 123. Tolokers can access the platform interface in the following languages: English, Spanish, French, Vietnamese, Russian, Uzbek, Turkish, Indonesian, German, Italian, Portuguese-Brazilian, Polish, Chinese, Korean, Japanese, Hindi.

Out of the 10,000 Tolokers surveyed* worldwide, only 34.3% identify as White / Caucasian, whereas 23.7% are Black, 29.0% Asian, and 2.2% Indigenous. Global Tolokers belong to over 600 different ethnicities, from Filipino to Kinh, and from Venezolana to Kikuyu.

Tolokers are 29.8 years old on average, and their gender distribution is 39.7% female, 58.8% male, 0.8% non-binary. The majority of them are single (63.7%) and have no children (70.7%), presumably due to their young age.

Socioeconomic status

Considerable evidence suggests that socioeconomic status, broadly defined, explains a range of important features, from judgments to life outcomes.

Tolokers come from countries with wildly differing economies, from Burundi, where the GDP per capita is $274, to Norway, where it's $67,390. Matching this global range, Tolokers' individual income also varies from under $1000 to $100,000 yearly. The global median annual income of Tolokers is about $4000. Most self-identify as working or lower-middle class.

Education and occupation

Global Tolokers appear well-educated: 82.9% report finishing at least some college, and 4.7% have advanced degrees. 72% report intermediate to advanced English skills (Toloka verifies performers' language skills by the platform quality control tools). Tolokers work in many industries, from service to manufacturing, and from healthcare to finance.

Religion and culture

66.9% of Tolokers identify themselves as religious or spiritual. Tolokers represent a diverse spectrum of world religions, from Judaism to Shintō, and from Muism to Cao Dai, most of them observing Islam (46.6%), Christianity (40.4%), Hinduism (7.3%), and Buddhism (2.4%).

Tolokers speak a variety of languages including exotic ones such as Namwanga, Ikwerre, Hiligaynon, or Ewe; in total, they report speaking over 800 different languages and dialects. Most Tolokers (98%) speak at least some English, and many of them also know Spanish (39%), French (36%), Arabic (31%), Portuguese (22%), German (21%), Turkish (21%), Italian (18%), and Russian (11%).

Among their hobbies, Tolokers mention sports, reading, computer games, music, cinema, and... Toloka!

COVID-19

When the global pandemic struck, we saw an influx of new users to Toloka. Since then, many have reported that Toloka has become a vital source of additional income during the pandemic. In this year's survey, 75% of Tolokers reported they'd lost at least some income during the pandemic years, so Toloka keeps providing important additional income to disadvantaged populations around the globe.

Why they do labeling

Is Toloka a main source of income for its users? Mostly not. 65.8% of Tolokers are permanently employed and 60.5% report having stable income. They use crowdsourcing for reasons other than breadwinning.

Actually, that's exactly what they say: many Tolokers report doing microwork for reasons other than earning money, such as: for fun (17%), to make the world a better place (17%), to gain work experience and advance career (13%), or to be part of modern technology (11%). However, earning extra cash is on the table, too: 62% Tolokers mainly use Toloka as an additional source of income, while for 18% of performers Toloka is the main source of one.

Tolokers also just enjoy data labeling. 81% anonymously report they love Toloka. Among reasons, they mention Toloka's flexibility ("you can pick what you like"; "even when I take care of my child at home I can earn to help my spouse with the everyday expenses") and learning opportunities ("it stimulates my visual, logical, and intellectual capacity", "helps me practice foreign languages", "with Toloka, I can learn and earn"). Some Tolokers confess that Toloka refreshes them on school lectures and helps them kill time. Generally, Tolokers seem to share our own belief that the platform is "good for both people and the improving technologies."

This will only become more important

As algorithms take over more and more meaningful decisions, it matters more and more that these algorithms learn from high-quality, diverse, and representative data. With Toloka you can harvest quality training data with a truly global perspective, and do so responsibly.

We surveyed a randomized sample of ~5% weekly active Tolokers, weighted by performers' verified country of residence to reflect the up-to-date regional distribution of Tolokers who submit real tasks. The resulting sample size was 10,000 Tolokers.
Article written by:
Toloka Team
Toloka Team
Updated: 

Recent articles

Have a data labeling project?

Take advantage of Toloka technologies. Chat with our expert to learn how to get reliable training data for machine learning at any scale.
Fractal

More about Toloka

  • Our mission is to empower businesses with high quality data to develop AI products that are safe, responsible and trustworthy.
  • Toloka is a European company. Our global headquarters is located in Amsterdam. In addition to the Netherlands, Toloka has offices in the US, Israel, Switzerland, and Serbia. We provide data for Generative AI development.
  • We are the trusted data partner for all stages of AI development–from training to evaluation. Toloka has over a decade of experience supporting clients with its unique methodology and optimal combination of machine learning technology and human expertise. Toloka offers high quality expert data for training models at scale.
  • The Toloka team has supported clients with high-quality data and exceptional service for over 10 years.
  • Toloka ensures the quality and accuracy of collected data through rigorous quality assurance measures–including multiple checks and verifications–to provide our clients with data that is reliable and accurate. Our unique quality control methodology includes built-in post-verification, dynamic overlaps, cross-validation, and golden sets.
  • Toloka has developed a state-of-the-art technology platform for data labeling and has over 10 years of managing human efforts, ensuring operational excellence at scale. Now, Toloka collaborates with data workers from 100+ countries speaking 40+ languages across 20+ knowledge domains and 120+ subdomains.
  • Toloka provides high-quality data for each stage of large language model (LLM) and generative AI (GenAI) development as a managed service. We offer data for fine-tuning, RLHF, and evaluation. Toloka handles a diverse range of projects and tasks of any data type—text, image, audio, and video—showcasing our versatility and ability to cater to various client needs.
  • Toloka addresses ML training data production needs for companies of various sizes and industries– from big tech giants to startups. Our experts cover over 20 knowledge domains and 120 subdomains, enabling us to serve every industry, including complex fields such as medicine and law. Many successful projects have demonstrated Toloka's expertise in delivering high-quality data to clients. Learn more about the use cases we feature on our customer case studies page.