Data protection at Toloka

Toloka Team
by Toloka Team
Image

Subscribe to Toloka News

Subscribe to Toloka News

We live and breathe data at Toloka, which is why data protection and information security are top priority for us. This is something we won’t ever compromise on. Read on to find out what actions we take to keep your data safe.

First of all, let's define the types of data we are talking about:

  • Information about our Tolokers (including PII).
  • Information about our requesters (including PII).
  • The data that our requesters provide for their projects.
  • The data that our Tolokers collect and label as they’re completing those projects.

What is PII?

PII, or personally identifiable information, is a type of sensitive data that can identify a person. It can contain personal information like full name, telephone number, email address, residential or postal address, license plate number, photos, and biometrics. This type of information may come from both our clients and annotators. PII is protected by privacy laws like GDPR (probably one of the toughest and best-known privacy regulations in the world). Violating these laws can lead to some very steep fines.

The data that Tolokers collect during tasks may contain personal information about other Tolokers or third parties (we share some examples of that later). The data that requesters upload to the platform for labeling may also contain third-party PII.

Where is data stored?

We use servers provided by Microsoft Azure for data storage that may be located in the US, the EU, or Asia. The client can choose where they’d like to keep their data: they can opt for virtual storage in a private cloud or their own on-premises storage.

The client can:

  • Store data outside Toloka’s infrastructure.
  • Make use of a private cloud on the servers we use.
  • Choose the exact region of their servers.

All data is processed on application servers and transmitted through an API (or web interface) using an encrypted TLS channel.

Our microservice architecture is divided into access zones, which are protected by a host-based firewall and microservice authentication. Access control is guided by our information security system.

How we process PII

Apart from strict security measures dictated by regulatory bodies, we use a combination of our own methods for PII management.

PII of Tolokers

We restrict access to the PII of Tolokers. Only a handful of employees can access Toloker PII in special circumstances as part of their job responsibilities.

We encourage requesters to explain to Tolokers how they will use the data that they collect, and we offer templates for many types of data collection tasks to help requesters do this correctly.

We give requesters tools to choose which Tolokers can access their tasks. For full transparency, requesters can always see the hash_id of Tolokers who are labeling their tasks. All other data processing is automated so that Toloka employees don’t have access to project data.

Finally, we delete all PII that belongs to Tolokers when their accounts are deleted. We also delete all project data if the requester asks us to. We call it the “Wipe-out privilege”.

Cookies

We’re transparent about cookies and tracking. Every Toloka visitor can set their cookie preferences and decline our tracking request. There are three types of cookies on our website:

  • Session cookies, which are needed for the website to function properly.
  • Analytical cookies, which are used to prepare statistics on user behavior and demographics.
  • Marketing cookies, which are used to make offers and advertise the product.

Field tasks

PII protection measures are particularly important when it comes to field tasks. These are tasks that ask Tolokers to go to a location on the map and submit photo evidence, such as photos of a business at a certain address. Field tasks raise unique privacy concerns — when Tolokers take photos on the street and happen to capture random pedestrians or private vehicles, they could be obtaining PII from third parties without their consent. This is why we detect and blur all license plate numbers and faces as the data comes in. This is a native feature built into our mobile app.

Protection against unauthorized access

We use several tried-and-true methods to prevent unauthorized access to your data:

  • SSO (single sign-on) everywhere on our system.
  • MFA (multi-factor authentication) for accessing internal infrastructure from the outside.
  • TSA (two-step authentication) for access to production services like Azure.
  • SDLC (secure development lifecycle), a process that governs security principles of software development.
  • SAST (static application security testing), a special tool that scans the source code of applications for errors and inconsistencies.
  • DAST (dynamic application security testing), a special mechanism that tests applications as they run to identify any performance issues, shortcomings, and vulnerabilities.
  • Security agents and performance monitoring on laptops/desktops using our SOC (security operation center).
  • First-class laptop/desktop encryption.

To ensure best practices on an individual level, all Toloka employees complete these required courses:

  • Development Security.
  • Advanced Anti-Phishing Tools.
  • NDA (Non-Disclosure Agreement) Principles.

Audits and security certificates

Since we work with large IT companies, we make sure to comply with all major information security regulations. This includes undergoing regular audits and attaining specialized certificates. Plus, we spend a great deal of time studying and actualizing security demands from our clients.

Toloka already has ISO 27001 – the most fundamental international standard for managing information security. This certificate means we have the right tools in place for managing our data securely 24/7. Since this is an ongoing process, we have our own ISO 27001 internal audits every quarter to make sure the standards are maintained throughout. This involves conducting risk assessments and finding ways to address any issues.

We’re also working to comply with:

  • ISO 27701, a privacy extension to ISO 27001 that deals with managing PII. It ensures the company is operating in accordance with GDPR (General Data Protection Regulation) and CPPA (California Privacy Protection Agency) among other regulatory bodies.
  • HIPPA (Health Insurance Portability and Accountability Act), an American federal law that ensures medical data is protected from fraud, theft, and accidental leaks. We are adapting our processes to meet HIPPA requirements so that we can safely handle medical data.
  • SOC 2/3 (Service and Organization Controls 2/3), a series of comprehensive audits that examine all processes and procedures in tech companies that offer online services. In essence, this is an international standard that deals with cybersecurity risk management.

Accredited auditors also organize controlled penetration tests (ethical hacking into our system) on a regular basis to identify any potential vulnerability issues during a targeted cyber attack. The last audit of this type was conducted by KPMG in April of this year.

There’s more

If you want to learn more about our security measures, visit our security page. Please get in touch with us on Slack if you have any questions or comments – we always appreciate your feedback.

Article written by:
Toloka Team
Toloka Team
Updated: 

Recent articles

Have a data labeling project?

Take advantage of Toloka technologies. Chat with our expert to learn how to get reliable training data for machine learning at any scale.
Fractal

More about Toloka

  • Our mission is to empower businesses with high quality data to develop AI products that are safe, responsible and trustworthy.
  • Toloka is a European company. Our global headquarters is located in Amsterdam. In addition to the Netherlands, Toloka has offices in the US, Israel, Switzerland, and Serbia. We provide data for Generative AI development.
  • We are the trusted data partner for all stages of AI development–from training to evaluation. Toloka has over a decade of experience supporting clients with its unique methodology and optimal combination of machine learning technology and human expertise. Toloka offers high quality expert data for training models at scale.
  • The Toloka team has supported clients with high-quality data and exceptional service for over 10 years.
  • Toloka ensures the quality and accuracy of collected data through rigorous quality assurance measures–including multiple checks and verifications–to provide our clients with data that is reliable and accurate. Our unique quality control methodology includes built-in post-verification, dynamic overlaps, cross-validation, and golden sets.
  • Toloka has developed a state-of-the-art technology platform for data labeling and has over 10 years of managing human efforts, ensuring operational excellence at scale. Now, Toloka collaborates with data workers from 100+ countries speaking 40+ languages across 20+ knowledge domains and 120+ subdomains.
  • Toloka provides high-quality data for each stage of large language model (LLM) and generative AI (GenAI) development as a managed service. We offer data for fine-tuning, RLHF, and evaluation. Toloka handles a diverse range of projects and tasks of any data type—text, image, audio, and video—showcasing our versatility and ability to cater to various client needs.
  • Toloka addresses ML training data production needs for companies of various sizes and industries– from big tech giants to startups. Our experts cover over 20 knowledge domains and 120 subdomains, enabling us to serve every industry, including complex fields such as medicine and law. Many successful projects have demonstrated Toloka's expertise in delivering high-quality data to clients. Learn more about the use cases we feature on our customer case studies page.