Multi-Objective ROC learning for classification

Clark, Andrew Robert James

dc.contributor.author	Clark, Andrew Robert James	en_GB
dc.date.accessioned	2012-05-22T10:47:25Z	en_GB
dc.date.accessioned	2013-03-21T10:24:24Z
dc.date.issued	2011-12-15	en_GB
dc.description.abstract	Receiver operating characteristic (ROC) curves are widely used for evaluating classifier performance, having been applied to e.g. signal detection, medical diagnostics and safety critical systems. They allow examination of the trade-offs between true and false positive rates as misclassification costs are varied. Examination of the resulting graphs and calcu- lation of the area under the ROC curve (AUC) allows assessment of how well a classifier is able to separate two classes and allows selection of an operating point with full knowledge of the available trade-offs. In this thesis a multi-objective evolutionary algorithm (MOEA) is used to find clas- sifiers whose ROC graph locations are Pareto optimal. The Relevance Vector Machine (RVM) is a state-of-the-art classifier that produces sparse Bayesian models, but is unfor- tunately prone to overfitting. Using the MOEA, hyper-parameters for RVM classifiers are set, optimising them not only in terms of true and false positive rates but also a novel measure of RVM complexity, thus encouraging sparseness, and producing approximations to the Pareto front. Several methods for regularising the RVM during the MOEA train- ing process are examined and their performance evaluated on a number of benchmark datasets demonstrating they possess the capability to avoid overfitting whilst producing performance equivalent to that of the maximum likelihood trained RVM. A common task in bioinformatics is to identify genes associated with various genetic conditions by finding those genes useful for classifying a condition against a baseline. Typ- ically, datasets contain large numbers of gene expressions measured in relatively few sub- jects. As a result of the high dimensionality and sparsity of examples, it can be very easy to find classifiers with near perfect training accuracies but which have poor generalisation capability. Additionally, depending on the condition and treatment involved, evaluation over a range of costs will often be desirable. An MOEA is used to identify genes for clas- sification by simultaneously maximising the area under the ROC curve whilst minimising model complexity. This method is illustrated on a number of well-studied datasets and ap- plied to a recent bioinformatics database resulting from the current InChianti population study. Many classifiers produce “hard”, non-probabilistic classifications and are trained to find a single set of parameters, whose values are inevitably uncertain due to limited available training data. In a Bayesian framework it is possible to ameliorate the effects of this parameter uncertainty by averaging over classifiers weighted by their posterior probabil- ity. Unfortunately, the required posterior probability is not readily computed for hard classifiers. In this thesis an Approximate Bayesian Computation Markov Chain Monte Carlo algorithm is used to sample model parameters for a hard classifier using the AUC as a measure of performance. The ability to produce ROC curves close to the Bayes op- timal ROC curve is demonstrated on a synthetic dataset. Due to the large numbers of sampled parametrisations, averaging over them when rapid classification is needed may be impractical and thus methods for producing sparse weightings are investigated.	en_GB
dc.identifier.citation	Clark, A. and Everson, R. (2011). Evolving sparse multi-resolution RVM classifiers. In Dupenois, M. and Walker, D., editors, Proceedings of the 2nd Postgraduate Confer- ence for Computing: Applications and Theory (PCCAT 2011), pages 53 – 60, Exeter, UK. PCCAT, College of Engineering, Mathematics and Physical Sciences, University of Exeter.	en_GB
dc.identifier.citation	Clark, A. and Everson, R. (2011). Multi-objective learning of relevance vector machine classifiers with multi-resolution kernels. Pattern Recognition, available online 7 March 2012 (http://www.sciencedirect.com/science/article/pii/S0031320312001033)	en_GB
dc.identifier.uri	http://hdl.handle.net/10036/3530	en_GB
dc.language.iso	en	en_GB
dc.publisher	University of Exeter	en_GB
dc.subject	Relevance Vector Machine	en_GB
dc.subject	Multi-objective optimisation	en_GB
dc.subject	ROC curves	en_GB
dc.subject	Classification	en_GB
dc.subject	Approximate Bayesian Computation	en_GB
dc.subject	Cross-validation	en_GB
dc.subject	Evolutionary algorithm	en_GB
dc.subject	Multi-resolution kernels	en_GB
dc.title	Multi-Objective ROC learning for classification	en_GB
dc.type	Thesis or dissertation	en_GB
dc.date.available	2012-05-22T10:47:25Z	en_GB
dc.date.available	2013-03-21T10:24:24Z
dc.contributor.advisor	Everson, Richard	en_GB
dc.publisher.department	Computer Science	en_GB
dc.type.degreetitle	PhD in Computer Science	en_GB
dc.type.qualificationlevel	Doctoral	en_GB
dc.type.qualificationname	PhD	en_GB

Files in this item

Name:: ClarkA.pdf
Size:: 7.121Mb
Format:: PDF
Description:: Full Text of Thesis

View/Open

Name:: ClarkA_fm.pdf
Size:: 90.62Kb
Format:: PDF
Description:: Front Matter

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record

Show Statistical Information