Weighted majority vote is a fundamental technique for combining predictions of multiple classifiers. Its power lies in the cancellation of errors effect: when individual classifiers perform better than a random guess and make independent errors, the errors average out and the majority vote tends to outperform the individual classifiers. Due to this effect, weighted majority vote is often a part of the winning strategies in many machine learning competitions. It is also often used in crowdsourcing.
In this talk, we present a novel analysis of the expected risk of weighted majority vote in multiclass classification. The analysis takes correlation of predictions by ensemble members into account and provides a bound that is amenable to efficient minimization, which yields improved weighting for the majority vote. We also provide a specialized version of our bound for binary classification, which allows to exploit additional unlabeled data for tighter risk estimation. In experiments, we apply the bound to improve weighting of trees in random forests and show that, in contrast to the commonly used first order bound, minimization of the new bound typically does not lead to degradation of the test error of the ensemble.
The talk is based on a joint work with Andrés R. Masegosa, Stephan S. Lorenzen, and Christian Igel published at NeurIPS-2020.