Solutions

Datasets

Research

Resources

Company

Talk to us

Premium Coding Data For Your   Model or Agent

Turn your agent or model into an expert coder
with curated custom datasets.

Talk to us

Trusted by Leading ML & AI Teams:

Tap Into a Network of Expert Programmers

Our vast network of vetted coding experts offers scalable data production for the languages,
coding domains, and programming expertise of your choice.

Professional Profiles

Backend engineers

Frontend engineers

Mobile developers

Data scientists

Software architects

ML engineers

DevOps specialists

QA testers

Frameworks

Django

Flask

React

Angular

Node.js

.NET

Spark

Hadoop

Kafka

Kubernetes

Docker

TensorFlow

PyTorch

Vue.js

Next.js

Keras

Coding Languages

Python

C++

Ruby

Rust

JavaScript

TypeScript

CSS

Java

Kotlin

Scala

PHP

Bash

SQL

Swift

Dart

MATLAB

Solidity

Spoken Languages

English

French

German

Spanish

Hindi

Malay

Russian

Bengali

Filipino

Ukrainian

Vietnamese

Tamil

Japanese

Korean

Dutch

Thai

Tamil

Enhance Core Skills for Coding Models

Refine foundation model capabilities for solving coding tasks and building advanced solutions with our custom datasets.

Skills and scenarios to develop

Code Generation

Code Understanding

Code Testing

Code Analysis

Tool usage

Case study

Preferences Labeling for Code Explanation

Client type:

Big tech

Experts:

Professional Profiles:

Backend engineers

Frontend engineers

Mobile developers

Coding Languages:

C++

Python

Java

Scala

JavaScript

TypeScript

Kotlin

Ruby

Scala

PHP

Rust

Spoken Languages:

English

French

German

Spanish

Application:

Improving code understanding and explanation capabilities for foundational coding model

Volume:

10,000 pairs

2,000 per week

View case details

Explain the is_allowed method of the MultiTierRateLimiter class in the provided code snippet.

import time
import redis

class MultiTierRateLimiter:
    def __init__(self, client, default_tiers):
        self.client = client
        self.default_tiers = default_tiers  # {tier: (max_requests, period)}

    def get_tier(self, user_id):
        # Simulated method to fetch a user's tier
        return "premium" if user_id.startswith("premium_") else "standard"

    def is_allowed(self, user_id):
        tier = self.get_tier(user_id)
        max_requests, period = self.default_tiers.get(tier, (10, 60))  # Default to 10 requests per minute
        current_time = int(time.time())
        key = f"rate_limiter:{tier}:{user_id}"
        pipeline = self.client.pipeline()

        # Manage timestamps for rate limiting
        pipeline.zremrangebyscore(key, 0, current_time - period)
        pipeline.zadd(key, {current_time: current_time})
        pipeline.zcard(key)
        pipeline.expire(key, period)

        _, _, request_count, _ = pipeline.execute()

        # Dynamic penalty if requests exceed threshold
        if request_count > max_requests:
            penalty_key = f"penalty:{user_id}"
            if not self.client.get(penalty_key):  # First penalty
                self.client.setex(penalty_key, period, 1)
                return False
            else:
                return False
        return True

import time
import redis

class MultiTierRateLimiter:
    def __init__(self, client, default_tiers):
        self.client = client
        self.default_tiers = default_tiers  # {tier: (max_requests, period)}

    def get_tier(self, user_id):
        # Simulated method to fetch a user's tier
        return "premium" if user_id.startswith("premium_") else "standard"

    def is_allowed(self, user_id):
        tier = self.get_tier(user_id)
        max_requests, period = self.default_tiers.get(tier, (10, 60))  # Default to 10 requests per minute
        current_time = int(time.time())
        key = f"rate_limiter:{tier}:{user_id}"
        pipeline = self.client.pipeline()

        # Manage timestamps for rate limiting
        pipeline.zremrangebyscore(key, 0, current_time - period)
        pipeline.zadd(key, {current_time: current_time})
        pipeline.zcard(key)
        pipeline.expire(key, period)

        _, _, request_count, _ = pipeline.execute()

        # Dynamic penalty if requests exceed threshold
        if request_count > max_requests:
            penalty_key = f"penalty:{user_id}"
            if not self.client.get(penalty_key):  # First penalty
                self.client.setex(penalty_key, period, 1)
                return False
            else:
                return False
        return True

Advance Coding Agents for Complex,
Long-Horizon Tasks

Empower coding agents and assistants to excel in end-to-end tasks requiring step-by-step reasoning and autonomous behavior. From environment interaction to code writing, testing, and error analysis—our curated data drives agentic capabilities.

Skills and scenarios to develop

Repository Generation & Prototyping

Repository Issue Resolution & Code Review

Multi-Turn Chat Assistance

Visual Frontend Development

Data Analysis

Case study

Repository Issue Resolution

Skill:

Repository Issue Resolution (Pull Request Generation)

Data type:

Agent’s Trajectories Evaluation

Experts:

Professional Profiles:

Backend engineers

Software architects

DevOps engineers

Coding Languages:

Python

Java

C++

Rust

Spoken Languages:

English

Domain or Application:

Coding agent for repository maintenance and bug-fixing tasks

Client type:

Coding AI agents startup

Volume:

5,000 trajectories

500 per week

View case details

Issue: from_json does not correctly convert BulkDataURI's in SQ data elements #1254 (bug)

Title: from_json does not correctly convert BulkDataURI's in SQ data elements

Body:

Describe the bug: When a DICOM object contains large data elements in SQ elements and is converted to JSON, those elements are correctly turned into BulkDataURI's. However, when the JSON is converted back to DICOM using from_json, the BulkDataURI's in SQ data elements are not converted back and warnings are thrown. The problem is in jsonrep.py at line 227.
Expected behavior: The BulkDataURI's in SQ data elements get converted back correctly.
Steps To Reproduce: Take the waveform_ecg.dcm in the test data, convert it to JSON, and then convert the JSON to DICOM

The first step is to reproduce the bug.  

We will convert a DICOM object to JSON and back to DICOM to check how `BulkDataURIs` 
in `SQ` elements are processed. The issue occurs in `jsonrep.py` at line 227. 
Before reviewing the code, let’s confirm the `waveform_ecg.dcm` file is available

The first step is to reproduce the bug.  

We will convert a DICOM object to JSON and back to DICOM to check how `BulkDataURIs` 
in `SQ` elements are processed. The issue occurs in `jsonrep.py` at line 227. 
Before reviewing the code, let’s confirm the `waveform_ecg.dcm` file is available

Data Solutions

Demonstrations Generation

Preference Labeling

Task Collection

Trajectory Annotation

Synthetic Data Enhancement

Evaluation Datasets

Red Teaming

Learn more about Toloka

See all

Fixing SWE-bench: A Smarter Way to Evaluate Coding AI

Mar 17, 2025

The Data Behind DeepSeek’s Success

Feb 10, 2025

LLM for code generation: a scalable pipeline to gather SFT data

LLM for code generation: a scalable pipeline to gather SFT data

Apr 29, 2024

Trusted by Leading ML & AI Teams:

Enhance your coding AI models with curated datasets

Talk to us

Premium Coding Data For Your Model or Agent