Diversity and generalisation error in classification ensembles

Ivascu, C

Abstract

Ensembles are important tools in machine learning because they are often more accurate than single predictors. Although it has been shown that an accurate ensemble would benefit from having both accurate and diverse predictors, some studies in the literature could not support the influence that diversity has on the overall accuracy ...

Ensembles are important tools in machine learning because they are often more accurate than single predictors. Although it has been shown that an accurate ensemble would benefit from having both accurate and diverse predictors, some studies in the literature could not support the influence that diversity has on the overall accuracy of an ensemble. In this thesis we are investigating the influence that diversity has on improving accuracy or equivalently reducing the generalisation error. There have been many diversity measures introduced in the literature, however as outlined in [1] the only one that had a strong negative correlation with generalisation error, was a diversity measure called ambiguity. The ambiguity measure was obtained by using the bias-variance decomposition of classifiers along with the 0-1 loss. As a result, our first set of experiments focuses on this type of diversity measure. We analyse the effect that the ambiguity measure has on decreasing the generalisation error of forests created by bootstrapping. We compare the effect of the ambiguity by having bootstrapping with or without replacement, by varying the number of trees, by varying the patterns or features used in building each tree. Our results show that bootstrapping without replacement yields lower test errors. A similar effect has been seen on bigger ensembles or by providing more data to the classifiers. We propose pruning approaches that involve ambiguity and compare their effect on the generalisation error versus a pruning method that promotes randomness. Our results show that there is no significant difference between the two types of approaches. Next, we define two new ambiguity measures derived from the cross entropy and hinge loss. We analyse their properties and find that out of the three ambiguity measures defined for classifiers (including the 0-1 loss introduced earlier), the only one that achieves all the desired properties of a diversity measure is the one obtained from the cross entropy (being always positive, and zero if and only if all the classifiers agree). We build ensembles by using bagging and by varying the sampling rates, we find that there is a negative correlation between generalisation error and diversity at high sampling rates; conversely generalisation error is positively correlated with diversity when the sampling rate is low and the diversity high. We use an evolutionary algorithm in order to maximise ambiguity and we find that the evolved ensemble in general has lower generalisation error than the initial ensemble. We define the term “ambiguous ensembles” as ensembles with high values of ambiguity. Additionally, we investigate the effect of pruning on larger ensembles and propose several pruning methods that prioritize ambiguity, as well as others that promote less ambiguous ensembles. Our results show that the approaches the prefer ambiguous ensembles reduce the generalisation error. Hence, our overall results support the influence that the diversity has on minimising generalisation error. Finally, we define diverse forests by building trees with different impurities. We choose families of impurities which are characterized by different parameters and we analyse the effect of choosing different parameters has on the generalisation performance. By tuning the parameters we can define symmetric or asymmetric impurities. In the case of imbalanced datasets the use of asymmetric impurities has been proven beneficial in predicting the minority class which usually is of big interest. We contrast the behaviour of the forests by using symmetric, asymmetric impurities with forests of trees built with different impurities (different parameters). Our results do not show a significant difference in performance.

Diversity and generalisation error in classification ensembles

Doctoral Theses

Doctoral College