uncertainty
crowdkit.metrics.data._classification.uncertainty
| Source code
uncertainty(
answers: DataFrame,
workers_skills: Optional[Series] = None,
aggregator: Optional[BaseClassificationAggregator] = None,
compute_by: str = 'task',
aggregate: bool = True
)
Label uncertainty metric: entropy of labels probability distribution.
Computed as Shannon's Entropy with label probabilities computed either for tasks or workers:
Parameters Description
Parameters | Type | Description |
---|---|---|
answers |
DataFrame | Workers' labeling results. A pandas.DataFrame containing |
workers_skills |
Optional[Series] | workers' skills. A pandas.Series index by workers and holding corresponding worker's skill |
-
Returns:
Union[float, pd.Series]
-
Return type:
Union[float, Series]
Examples:
Mean task uncertainty minimal, as all answers to task are same.
uncertainty(pd.DataFrame.from_records([
{'task': 'X', 'worker': 'A', 'label': 'Yes'},
{'task': 'X', 'worker': 'B', 'label': 'Yes'},
]))
Mean task uncertainty maximal, as all answers to task are different.
uncertainty(pd.DataFrame.from_records([
{'task': 'X', 'worker': 'A', 'label': 'Yes'},
{'task': 'X', 'worker': 'B', 'label': 'No'},
{'task': 'X', 'worker': 'C', 'label': 'Maybe'},
]))
Uncertainty by task without averaging.
uncertainty(pd.DataFrame.from_records([
{'task': 'X', 'worker': 'A', 'label': 'Yes'},
{'task': 'X', 'worker': 'B', 'label': 'No'},
{'task': 'Y', 'worker': 'A', 'label': 'Yes'},
{'task': 'Y', 'worker': 'B', 'label': 'Yes'},
]),
workers_skills=pd.Series([1, 1], index=['A', 'B']),
compute_by="task", aggregate=False)
Uncertainty by worker
uncertainty(pd.DataFrame.from_records([
{'task': 'X', 'worker': 'A', 'label': 'Yes'},
{'task': 'X', 'worker': 'B', 'label': 'No'},
{'task': 'Y', 'worker': 'A', 'label': 'Yes'},
{'task': 'Y', 'worker': 'B', 'label': 'Yes'},
]),
workers_skills=pd.Series([1, 1], index=['A', 'B']),
compute_by="worker", aggregate=False)