Kernels for Protein Homology Detection

Spalding, John Dylan

dc.contributor.author	Spalding, John Dylan	en_GB
dc.date.accessioned	2010-04-27T10:38:35Z	en_GB
dc.date.accessioned	2011-01-25T17:03:58Z	en_GB
dc.date.accessioned	2013-03-21T11:14:48Z
dc.date.issued	2009-10-09	en_GB
dc.description.abstract	Determining protein sequence similarity is an important task for protein classification and homology detection, which is typically performed using sequence alignment algorithms. Fast and accurate alignment-free kernel based classifiers exist, that treat protein sequences as a “bag of words”. Kernels implicitly map the sequences to a high dimensional feature space, and can be thought of as an inner product between two vectors in that space. This allows an algorithm that can be expressed purely in terms of inner products to be ‘kernelised’, where the algorithm implicitly operates in the kernel’s feature space. A weighted string kernel, where the weighting is derived using probabilistic methods, is implemented using a binary data representation, and the results reported. Alternative forms of data representation, such as Ising and frequency forms, are implemented and the results discussed. These results are then used to inform the development of a variety of novel kernels for protein sequence comparison. Alternative forms of classifier are investigated, such as nearest neighbour, support vector machines, and multiple kernel learning. A kernelized Gaussian classifier is derived and tested, which is informative as it returns a score related to the probability of a sequence belonging to a particular classification. Support vector machines are tested with the introduced kernels, and the results compared to alternate classifiers. As similarity can be thought of as having different components, such as composition and position, multiple kernel learning is investigated with the novel kernels developed here. The results show that a support vector machine, using either single or multiple kernels, is the best classifier for remote protein homology detection out of all the classifiers tested in this thesis.	en_GB
dc.description.sponsorship	EPSRC	en_GB
dc.identifier.uri	http://hdl.handle.net/10036/97435	en_GB
dc.language.iso	en	en_GB
dc.publisher	University of Exeter	en_GB
dc.subject	Kernel methods	en_GB
dc.subject	Protein homology	en_GB
dc.subject	Support vecotr machine	en_GB
dc.subject	Classification	en_GB
dc.title	Kernels for Protein Homology Detection	en_GB
dc.type	Thesis or dissertation	en_GB
dc.date.available	2010-04-27T10:38:35Z	en_GB
dc.date.available	2011-01-25T17:03:58Z	en_GB
dc.date.available	2013-03-21T11:14:48Z
dc.contributor.advisor	Everson, Richard	en_GB
dc.contributor.advisor	Hoyle, David	en_GB
dc.publisher.department	Computer Science	en_GB
dc.type.degreetitle	PhD in Computer Science	en_GB
dc.type.qualificationlevel	Doctoral	en_GB
dc.type.qualificationname	PhD	en_GB

Files in this item

Name:: SpaldingJ.pdf
Size:: 1.072Mb
Format:: PDF

View/Open

Name:: SpaldingJ_fm.pdf
Size:: 225.5Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record

Show Statistical Information