crowdkit.metrics.data._classification.uncertainty
| Source code
uncertainty( answers: DataFrame, workers_skills: Optional[Series] = None, aggregator: Optional[BaseClassificationAggregator] = None, compute_by: str = 'task', aggregate: bool = True)
Label uncertainty metric: entropy of labels probability distribution.
Computed as Shannon's Entropy with label probabilities computed either for tasks or workers:
Parameters | Type | Description |
---|---|---|
answers | DataFrame | Workers' labeling results. A pandas.DataFrame containing |
workers_skills | Optional[Series] | workers' skills. A pandas.Series index by workers and holding corresponding worker's skill |
Returns:
Union[float, pd.Series]
Return type:
Union[float, Series]
Examples:
Mean task uncertainty minimal, as all answers to task are same.
uncertainty(pd.DataFrame.from_records([ {'task': 'X', 'worker': 'A', 'label': 'Yes'}, {'task': 'X', 'worker': 'B', 'label': 'Yes'},]))
Mean task uncertainty maximal, as all answers to task are different.
uncertainty(pd.DataFrame.from_records([ {'task': 'X', 'worker': 'A', 'label': 'Yes'}, {'task': 'X', 'worker': 'B', 'label': 'No'}, {'task': 'X', 'worker': 'C', 'label': 'Maybe'},]))
Uncertainty by task without averaging.
uncertainty(pd.DataFrame.from_records([ {'task': 'X', 'worker': 'A', 'label': 'Yes'}, {'task': 'X', 'worker': 'B', 'label': 'No'}, {'task': 'Y', 'worker': 'A', 'label': 'Yes'}, {'task': 'Y', 'worker': 'B', 'label': 'Yes'},]),workers_skills=pd.Series([1, 1], index=['A', 'B']),compute_by="task", aggregate=False)
X 0.693147 Y 0.000000 dtype: float64
Uncertainty by worker
uncertainty(pd.DataFrame.from_records([ {'task': 'X', 'worker': 'A', 'label': 'Yes'}, {'task': 'X', 'worker': 'B', 'label': 'No'}, {'task': 'Y', 'worker': 'A', 'label': 'Yes'}, {'task': 'Y', 'worker': 'B', 'label': 'Yes'},]),workers_skills=pd.Series([1, 1], index=['A', 'B']),compute_by="worker", aggregate=False)
A 0.000000 B 0.693147 dtype: float64