uncertainty

crowdkit.metrics.data._classification.uncertainty | Source code

uncertainty(
    answers: DataFrame,
    workers_skills: Optional[Series] = None,
    aggregator: Optional[BaseClassificationAggregator] = None,
    compute_by: str = 'task',
    aggregate: bool = True
)

Label uncertainty metric: entropy of labels probability distribution.

Computed as Shannon's Entropy with label probabilities computed either for tasks or workers:

H(L)=labeliLp(labeli)log(p(labeli))H(L) = -\sum_{label_i \in L} p(label_i) \cdot \log(p(label_i))

Parameters Description

Parameters Type Description
answers DataFrame

Workers' labeling results. A pandas.DataFrame containing task, worker and label columns.

workers_skills Optional[Series]

workers' skills. A pandas.Series index by workers and holding corresponding worker's skill

  • Returns:

    Union[float, pd.Series]

  • Return type:

    Union[float, Series]

Examples:

Mean task uncertainty minimal, as all answers to task are same.

uncertainty(pd.DataFrame.from_records([
    {'task': 'X', 'worker': 'A', 'label': 'Yes'},
    {'task': 'X', 'worker': 'B', 'label': 'Yes'},
]))

Mean task uncertainty maximal, as all answers to task are different.

uncertainty(pd.DataFrame.from_records([
    {'task': 'X', 'worker': 'A', 'label': 'Yes'},
    {'task': 'X', 'worker': 'B', 'label': 'No'},
    {'task': 'X', 'worker': 'C', 'label': 'Maybe'},
]))

Uncertainty by task without averaging.

uncertainty(pd.DataFrame.from_records([
    {'task': 'X', 'worker': 'A', 'label': 'Yes'},
    {'task': 'X', 'worker': 'B', 'label': 'No'},
    {'task': 'Y', 'worker': 'A', 'label': 'Yes'},
    {'task': 'Y', 'worker': 'B', 'label': 'Yes'},
]),
workers_skills=pd.Series([1, 1], index=['A', 'B']),
compute_by="task", aggregate=False)

Uncertainty by worker

uncertainty(pd.DataFrame.from_records([
    {'task': 'X', 'worker': 'A', 'label': 'Yes'},
    {'task': 'X', 'worker': 'B', 'label': 'No'},
    {'task': 'Y', 'worker': 'A', 'label': 'Yes'},
    {'task': 'Y', 'worker': 'B', 'label': 'Yes'},
]),
workers_skills=pd.Series([1, 1], index=['A', 'B']),
compute_by="worker", aggregate=False)