Smooth Relevance Vector Machines

Schmolck, Alexander

Abstract

Regression tasks belong to the set of core problems faced in statistics and machine learning and promising approaches can often be generalized to also deal with classification, interpolation or denoising problems. Whereas the most widely used classical statistical techniques place severe a priori constraints on the type of function ...

Regression tasks belong to the set of core problems faced in statistics and machine learning and promising approaches can often be generalized to also deal with classification, interpolation or denoising problems. Whereas the most widely used classical statistical techniques place severe a priori constraints on the type of function that can be approximated (e.g. only lines, in the case of linear regression), the successes of sparse kernel learners, such as the SVM (support vector machine) demonstrate that good results may be obtained in a quite general framework by enforcing sparsity. Similarly, even very simple sparsity-based denoising techniques, such as classical wavelet shrinkage, can produce surprisingly good results on a wide variety of different signals, because, unlike noise, most signals of practical interest share vital characteristics (such as smoothness, or the ability to be well approximated by piece-wise linear polynomials of a low order) that allow a sparse representation in wavelet space. On the other hand results obtained from SVMs (and classical wavelet-shrinkage) suffer from a certain lack of interpretability, since one cannot straightforwardly attach probabilities to them. By contrast regression, and even more importantly classification, in a Bayesian context always entails a probabilistic measure of confidence in the results, which, provided the model assumptions are reasonably accurate, forms a basis for principled decision-making. The relevance vector machine (RVM) combines these strengths by explicitly encoding the criterion of model sparsity as a (Bayesian) prior over the model weights and offers a single, unified paradigm to efficiently deal with regression as well as classification tasks. However the lack of an explicit prior structure over the weight variances means that the degree of sparsity is to a large extent controlled by the choice of kernel (and kernel parameters). This can lead to severe overfitting or oversmoothing -- possibly even both at the same time (e.g. for the multiscale Doppler data). This thesis details an efficient scheme to control sparsity in Bayesian regression by incorporating a flexible noise-dependent smoothness prior into the RVM. The resultant smooth RVM (sRVM) encompasses the original RVM as a special case, but empirical results with a variety of popular data sets show that it can surpass RVM performance in terms of goodness of fit and achieved sparsity as well as computational performance in many cases. As the smoothness prior effectively makes it possible to use (highly efficient) wavelet kernels in an RVM setting this work also unveils a strong connection between Bayesian wavelet shrinkage and RVM regression and effectively further extends the applicability of the RVM to denoising tasks for up to millions of datapoints. We further discuss its applicability to classification tasks.

Smooth Relevance Vector Machines

Doctoral Theses

Doctoral College