ClosestToAverage

crowdkit.aggregation.embeddings.closest_to_average.ClosestToAverage | Source code

ClosestToAverage(self, distance: Callable[[..., ...], float])

The Closest to Average aggregation model chooses the output with the embedding that's closest to the average embedding.

This method takes a DataFrame containing four columns: task, worker, output, and embedding. Here the embedding is a vector containing a representation of the output which might be any type of data such as text, images, NumPy arrays, etc. As a result, the method returns the output which embedding is the closest one to the average embedding of the task responses.

Parameters description

Parameters	Type	Description
`distance`	Callable[[..., ...], float]	A callable that takes two NumPy arrays (the task embedding and the aggregated embedding) and returns a single `float` number — the distance between these two vectors.
`embeddings_and_outputs_`	DataFrame	The task embeddings and outputs. The `pandas.DataFrame` data is indexed by `task` and has the `embedding` and `output` columns.
`scores_`	DataFrame	The task label scores. The `pandas.DataFrame` data is indexed by `task` so that `result.loc[task, label]` is a score of `label` for `task`.

Methods summary

Method	Description
fit	Fits the model to the training data.
fit_predict	Fits the model to the training data and returns the aggregated outputs.
fit_predict_scores	Fits the model to the training data and returns the estimated scores.

Last updated: March 31, 2023

Crowd-Kit

Overview

Reference

Aggregation

Datasets

Learning

Metrics

Postprocessing