If tasks were issued with an overlap of 2 or higher, run aggregation of results. Toloka will process all Tolokers' responses for the task and issue the resulting response and its confidence level.
Open the pool.
Click next to the Download results button.
Choose the aggregation method:
Aggregation takes from several minutes to several hours. Track the progress on the Operations page. When aggregation is complete, download the file with the results.
To receive notifications and emails when results aggregation is completed, set up notifications:
Log in to your account.
Go to Profile → Notifications → Pool or aggregation completed
Choose the notification method:
Email: Messages will be sent to your email address.
Messages: Notifications will be displayed under Messages in your account. Apart from you, those who set up shared access to your account can see them.
Browser: Notifications will be sent to the devices that you logged in to your account from.
The Dawid-Skene aggregation model takes into account the heterogeneity of Tolokers when aggregating responses. Statistical significance of the resulting response is determined based on the analysis of all Tolokers' responses.
The model evaluates |L|²
parameters for each Toloker, where L
is the number of all unique aggregation values.
The parameters used by the model are determined automatically for each pool and are only used only in calculations. You won't see these parameters in the aggregated results.
Because the Dawid-Skene model evaluates |L|²
parameters for each Toloker, we don't recommend using it when the Toloker labels < |L|²
tasks. Otherwise, the quality of aggregation may be poor.
The result of aggregation is a TSV file with responses. CONFIDENCE: <field name output>
indicates the response significance as a percentage.
Data for aggregation can be uploaded any way you want.
The Dawid-Skene model is a non-trivial aggregation algorithm. Check out its features and learn more about the model.
The method doesn't guarantee that original Toloker responses will be used for aggregation. The algorithm takes into account Tolokers' quality parameters and response patterns. Consequently, it can produce a result that's different from the Tolokers' responses to this task.
The Dawid-Skene aggregation model works with control and training tasks as well as with general tasks. There is a possibility that the OUTPUT:result
field for the control task in the TSV file won't match the actual response to this task (GOLDEN:result
).
If your project has output data marked as "required": false
and Tolokers don't fill in this field, it won't be included in aggregation.
For example, you have 1000 tasks. In 999 of them, Tolokers didn't label the label
field, and one Toloker labeled it as label=x
. As a result of aggregation, this data field will have CONFIDENCE = 100%
, since only one task out of a thousand falls under the aggregation conditions.
Aggregation only includes accepted tasks.
The main requirement for this aggregation is the output data fields:
Strings and numbers with allowed values.
The allowed value must match the value
parameter in the corresponding interface element.
Boolean.
Integers with minimum and maximum values. The maximum difference between them is 32.
If there are too many possible responses in the output field, the dynamic overlap mechanism won't be able to aggregate the data.
The allowed value must match the value
parameter in the corresponding interface element.
If you have doubts that the David-Skene aggregation model works correctly, you can:
Analyzes responses based on the level of confidence in the Toloker. The confidence level is determined by the skill you choose. Skills measure the probability of the Toloker completing the task correctly.
Each user skill has “weight”. The higher the skill, the more we trust the Toloker and believe that their responses are correct.
The result of aggregation is a TSV file with responses. CONFIDENCE: <field name output>
indicates the confidence in the aggregated response. In this case, it shows the probability that the response is correct.
Aggregation only includes accepted tasks.
To run aggregation, you must correctly set up dynamic overlap. To do this:
Select a skill. We recommend to select a skill calculated as the percentage of correct responses in control tasks. This will give you the most accurate aggregation results.
Select the output data fields.
Last updated: August 3, 2023