Try Toloka Deep Evaluation

Deep evaluation empowers your team to align model performance with your expectations 
and ensure model output is accurate, reliable, and responsible.