Smooth Relevance Vector Machines
Schmolck, Alexander
Date: 1 May 2008
Thesis or dissertation
Publisher
University of Exeter
Degree Title
PhD in Computer Science
Abstract
Regression tasks belong to the set of core problems faced in statistics
and machine learning and promising approaches can often be generalized to
also deal with classification, interpolation or denoising problems.
Whereas the most widely used classical statistical techniques place severe
a priori constraints on the type of function ...
Regression tasks belong to the set of core problems faced in statistics
and machine learning and promising approaches can often be generalized to
also deal with classification, interpolation or denoising problems.
Whereas the most widely used classical statistical techniques place severe
a priori constraints on the type of function that can be approximated
(e.g. only lines, in the case of linear regression), the successes of
sparse kernel learners, such as the SVM (support vector machine)
demonstrate that good results may be obtained in a quite general framework
by enforcing sparsity. Similarly, even very simple sparsity-based
denoising techniques, such as classical wavelet shrinkage, can produce
surprisingly good results on a wide variety of different signals, because,
unlike noise, most signals of practical interest share vital
characteristics (such as smoothness, or the ability to be well
approximated by piece-wise linear polynomials of a low order) that allow a
sparse representation in wavelet space. On the other hand results obtained
from SVMs (and classical wavelet-shrinkage) suffer from a certain lack of
interpretability, since one cannot straightforwardly attach probabilities
to them. By contrast regression, and even more importantly classification,
in a Bayesian context always entails a probabilistic measure of confidence
in the results, which, provided the model assumptions are reasonably
accurate, forms a basis for principled decision-making. The relevance
vector machine (RVM) combines these strengths by explicitly encoding the
criterion of model sparsity as a (Bayesian) prior over the model weights
and offers a single, unified paradigm to efficiently deal with regression
as well as classification tasks. However the lack of an explicit prior
structure over the weight variances means that the degree of sparsity is
to a large extent controlled by the choice of kernel (and kernel
parameters). This can lead to severe overfitting or oversmoothing --
possibly even both at the same time (e.g. for the multiscale Doppler
data). This thesis details an efficient scheme to control sparsity in
Bayesian regression by incorporating a flexible noise-dependent smoothness
prior into the RVM. The resultant smooth RVM (sRVM) encompasses the
original RVM as a special case, but empirical results with a variety of
popular data sets show that it can surpass RVM performance in terms of
goodness of fit and achieved sparsity as well as computational performance
in many cases. As the smoothness prior effectively makes it possible to
use (highly efficient) wavelet kernels in an RVM setting this work also
unveils a strong connection between Bayesian wavelet shrinkage and RVM
regression and effectively further extends the applicability of the RVM to
denoising tasks for up to millions of datapoints. We further discuss its
applicability to classification tasks.
Doctoral Theses
Doctoral College
Item views 0
Full item downloads 0