Toloka documentation


crowdkit.aggregation.embeddings.closest_to_average.ClosestToAverage.fit_predict_scores | Source code

    data: DataFrame,
    aggregated_embeddings: Optional[Series] = None

Fit the model and return the estimated scores.

Parameters Description

Parameters Type Description
data DataFrame

Workers' outputs with their embeddings. A pandas.DataFrame containing task, worker, output and embedding columns.

aggregated_embeddings Optional[Series]

Tasks' embeddings. A pandas.Series indexed by task and holding corresponding embeddings.

  • Returns:

    Tasks' label probability distributions. A pandas.DataFrame indexed by task such that result.loc[task, label] is the probability of task's true label to be equal to label. Each probability is between 0 and 1, all task's probabilities should sum up to 1

  • Return type: