This dataset is similar to the previous one, but rather than a binary choice for rating label relevance, it uses a five-point scale in the Relevance 5 Gradations project. The task was to assess the relevance of a document for a query on a 5-point scale. Some tasks in this dataset have more than one golden label. In these cases, all the golden labels are considered equally correct.
The key quality metric is accuracy of aggregated labels, which is estimated as the percentage of the aggregated labels that match one of the golden labels for a given task from the golden set. In addition to the crowdsourced labels, there is also information about performers who were banned for a certain reason. For each banned performer, the reason for banning is provided as one out of four ban types (details about each ban type are not given). The dataset contains more than 1 million labels.