Diversity and opportunity in the crowd: 2023 global survey of Tolokers

Toloka Team
by Toloka Team
Image

Subscribe to Toloka News

Subscribe to Toloka News

Since Toloka’s founding in 2014, the data labeling platform has grown and evolved into a data-centric environment with enterprise solutions for AI and ML development. Toloka's platform runs on a robust, secure infrastructure that supports a combination of ML technologies like adaptive AutoML and best-in-class crowdsourcing technologies for data quality management. But the backbone of our solutions is still the human insight gathered from the crowd.

We’ve built one of the largest and most diverse data labeling crowds on the planet, with millions of registered users and hundreds of thousands of people actively earning money each month. But who are these people we call Tolokers, the crowd contributors behind the usernames?

A recent global survey of Tolokers (10,000 respondents) offers a glimpse into what our global community looks like in 2023. The goals of the survey were to explore the identity of Tolokers, learn what motivates them, and find out how happy they are in Toloka.

Here’s what we discovered.

Tolokers: age and gender

Age-wise, the Toloka community is mostly young. The majority of Tolokers, 55%, fall within the 20 to 30-year-old age bracket, while 22% are between 30 and 40 years old. Only 1% of the crowd is over 60 years old. The single most common age is 23, representing 7% of all Tolokers.

Image
Age of Tolokers

When it comes to gender, male Tolokers outnumber other genders, with about 62% identifying as male, 36% as female, and 2% as non-binary.

Image
Gender of Tolokers

Ethnicity and nationality

Three racial identities are equally represented in the global crowd, at about 30% each: Asian, African, and white (from across all continents). Indigenous and Polynesian ethnicities represent smaller fractions of our community, but they are particularly important because of their ability to speak rare and culturally significant languages, some of them on the brink of extinction.

Tolokers live in more than 100 different countries, spread across every time zone in the world. There are no particular regions where Tolokers are based, but some of the larger concentrations include Pakistan, Kenya, Brazil, Turkey, India, Egypt, the Philippines, and the US.

Languages

When asked to identify their first language, Tolokers list over 40 major languages. About half of Tolokers are native speakers of English, Urdu, Arabic, Russian, or Spanish. But the other half represents dozens of different languages, showcasing the diversity of our crowd: Portuguese, Ukrainian, French, German, Italian, Polish, Latvian, Bulgarian, Czech, Turkish, Hindi, Vietnamese, Japanese, Chinese, Korean, and Indonesian, to name a few. We even have Tolokers who speak uncommon languages like Tatar and Quechua.

Almost all Tolokers speak at least some English, even if English is not their first language. Nearly 30% of our survey respondents describe their English skills as advanced. Only about 3% of Tolokers do not speak any English.

In addition to English, 20% of Tolokers have some proficiency in other languages. The most common non-native languages spoken by Tolokers are Spanish, French, Arabic, German, Russian, Portuguese, and Italian.

Tolokers bring a variety of linguistic skills to data collection, data labeling, and data evaluation processes (including RLHF for large language models).

Urban demographics and social status

Tolokers come from bustling cities, quiet towns, and everything in between — there is no dominating urban or rural lifestyle. Almost a quarter of Tolokers reside in large cities with over one million residents, while roughly 23% live in small cities (10,000 to 100,000 people) and 22% call mid-sized cities their home (100,000 to 500,000 people). A sizable 20% of Tolokers live in smaller towns of fewer than 10,000 residents.

Regarding social class, Tolokers are evenly divided between middle class and working class.

A variety of religious faiths are represented in the crowd. Nearly 71% of Tolokers affirm some form of religious belief, while 20% are not religious and 9% are unsure.

Family and household

In terms of marital status, the majority of Tolokers (almost 60%) are single, while almost 30% are married. A portion of the remaining Tolokers (about 7%) live with a partner, while roughly 3% are separated or divorced.

Image

Tolokers have a range of household sizes, which includes relatives, partners, and housemates within shared accommodation. The survey’s data shows that most (over 50%) live in households of three to five members. Households with one to two individuals make up roughly 20% of the total, while the remaining quarter of the respondents live in households with five or more individuals.

Parental status also varies among Tolokers. The majority (around 65%) do not have children. Just over 15% have one child, while close to 15% have two to three children. Families with four or more children make up a smaller fraction of the community.

Education and occupation

Tolokers have a wide range of educational backgrounds, but the majority have pursued higher education. Over a third (almost 37%) hold a bachelor’s degree, while close to 18% have not continued their education past high school. About 16.5% have attended some college, while slightly over 10% have a master’s degree. Notably, over three quarters of all Tolokers have some form of additional education, such as courses, professional certificates, or other training.

Image
Education completed by Tolokers

Employment status varies for Tolokers, but most use Toloka to supplement their income from another job. About a quarter of them do freelance work, while 22% have full-time jobs. 21% are currently unemployed and seeking work, while 16% are working part-time and 5% are running or involved in a business.

Image
Employment status of Tolokers

Tolokers work in a variety of industries, bringing a range of skills and expertise to the platform. The service sector has the largest representation, followed by finance, education, and manufacturing.

Image

Many Tolokers are at the beginning of their careers. About 40% have one to three years of work experience. This is followed by those with five to ten years of experience (16%), four to five years (15%), and ten to twenty years (12%).

Image
Years of work experience

When it comes to household income, nearly 20% of all surveyed Tolokers are the primary breadwinners in their household, while about 12% report that their spouse is the chief earner. Roughly 30% describe their income as somewhat stable, 22% as quite stable, and 6.5% as very stable, with the remaining contributors reporting that their monthly income fluctuates.

Toloker happiness

We asked Tolokers how they feel about being a Toloker, and why they joined. 85% responded that they feel good or very good about Toloka. Three-quarters of the contributors think that Toloka is a great way to supplement their main income, while a quarter of the respondents rely on Toloka as their main source of earnings.

Image
How Tolokers feel about Toloka

Here are their top reasons to use Toloka for earning extra income:

  • It’s a great way to develop new skills and pave the way for professional advancement (20%)
  • It’s a chance to make the world a better place (15%)
  • It’s fun (10%)

Seeing the real people in the crowd

The survey findings confirm the diversity of our crowd, which is important to prevent bias in training data. But it also helps us see the real people who are contributing their expertise to make AI better. The most valuable outcome is that Tolokers feel good about what they are doing.

The Toloka team believes that a positive environment for annotators is an essential part of Responsible AI. This includes offering annotators fair wages, opportunities to develop new skills, and the freedom to choose their own tasks, hours, and locations. To learn about how we set up fair wages and access to tasks for the BigCode project, read our blog post.

Article written by:
Toloka Team
Toloka Team
Updated: 

Recent articles

Have a data labeling project?

Take advantage of Toloka technologies. Chat with our expert to learn how to get reliable training data for machine learning at any scale.
Fractal