Subscribe to Toloka News
Subscribe to Toloka News
The world of machine learning is realizing the need for data collected from the widest possible demographic. Otherwise, the algorithms learn to mimic the biases, prejudices and culturally-specific understandings of a narrow group of people. Some businesses have even built products that work differently based on gender and skin color, compromising both the ethics and the effectiveness of AI.
Alternatively, the algorithms that get this right work in more places, deliver more value and serve a global market. Whether it's for data collection, labeling, or building human-in-the-loop pipelines, more and more engineers are coming to understand the need for a truly universal human perspective. That's why they want training data labeled by the widest possible mix of genders, nationalities, religions, social and working backgrounds.
That's why they want training data labeled by the most diverse crowd available.
That's why they want training data labeled by Toloka.
Let's take a closer look at the crowd behind it.
As of March 2022, Toloka has 245,000 monthly active performers worldwide, which is 20% more than last year. Of the world's 217 countries, Toloka has active performers in 123. Tolokers can access the platform interface in the following languages: English, Spanish, French, Vietnamese, Russian, Uzbek, Turkish, Indonesian, German, Italian, Portuguese-Brazilian, Polish, Chinese, Korean, Japanese, Hindi.
Out of the 10,000 Tolokers surveyed* worldwide, only 34.3% identify as White / Caucasian, whereas 23.7% are Black, 29.0% Asian, and 2.2% Indigenous. Global Tolokers belong to over 600 different ethnicities, from Filipino to Kinh, and from Venezolana to Kikuyu.
Tolokers are 29.8 years old on average, and their gender distribution is 39.7% female, 58.8% male, 0.8% non-binary. The majority of them are single (63.7%) and have no children (70.7%), presumably due to their young age.
Considerable evidence suggests that socioeconomic status, broadly defined, explains a range of important features, from judgments to life outcomes.
Tolokers come from countries with wildly differing economies, from Burundi, where the GDP per capita is $274, to Norway, where it's $67,390. Matching this global range, Tolokers' individual income also varies from under $1000 to $100,000 yearly. The global median annual income of Tolokers is about $4000. Most self-identify as working or lower-middle class.
Global Tolokers appear well-educated: 82.9% report finishing at least some college, and 4.7% have advanced degrees. 72% report intermediate to advanced English skills (Toloka verifies performers' language skills by the platform quality control tools). Tolokers work in many industries, from service to manufacturing, and from healthcare to finance.
66.9% of Tolokers identify themselves as religious or spiritual. Tolokers represent a diverse spectrum of world religions, from Judaism to Shintō, and from Muism to Cao Dai, most of them observing Islam (46.6%), Christianity (40.4%), Hinduism (7.3%), and Buddhism (2.4%).
Tolokers speak a variety of languages including exotic ones such as Namwanga, Ikwerre, Hiligaynon, or Ewe; in total, they report speaking over 800 different languages and dialects. Most Tolokers (98%) speak at least some English, and many of them also know Spanish (39%), French (36%), Arabic (31%), Portuguese (22%), German (21%), Turkish (21%), Italian (18%), and Russian (11%).
Among their hobbies, Tolokers mention sports, reading, computer games, music, cinema, and... Toloka!
When the global pandemic struck, we saw an influx of new users to Toloka. Since then, many have reported that Toloka has become a vital source of additional income during the pandemic. In this year's survey, 75% of Tolokers reported they'd lost at least some income during the pandemic years, so Toloka keeps providing important additional income to disadvantaged populations around the globe.
Is Toloka a main source of income for its users? Mostly not. 65.8% of Tolokers are permanently employed and 60.5% report having stable income. They use crowdsourcing for reasons other than breadwinning.
Actually, that's exactly what they say: many Tolokers report doing microwork for reasons other than earning money, such as: for fun (17%), to make the world a better place (17%), to gain work experience and advance career (13%), or to be part of modern technology (11%). However, earning extra cash is on the table, too: 62% Tolokers mainly use Toloka as an additional source of income, while for 18% of performers Toloka is the main source of one.
Tolokers also just enjoy data labeling. 81% anonymously report they love Toloka. Among reasons, they mention Toloka's flexibility ("you can pick what you like"; "even when I take care of my child at home I can earn to help my spouse with the everyday expenses") and learning opportunities ("it stimulates my visual, logical, and intellectual capacity", "helps me practice foreign languages", "with Toloka, I can learn and earn"). Some Tolokers confess that Toloka refreshes them on school lectures and helps them kill time. Generally, Tolokers seem to share our own belief that the platform is "good for both people and the improving technologies."
As algorithms take over more and more meaningful decisions, it matters more and more that these algorithms learn from high-quality, diverse, and representative data. With Toloka you can harvest quality training data with a truly global perspective, and do so responsibly.