Show simple item record

dc.contributor.authorSpalding, John Dylanen_GB
dc.date.accessioned2010-04-27T10:38:35Zen_GB
dc.date.accessioned2011-01-25T17:03:58Zen_GB
dc.date.accessioned2013-03-21T11:14:48Z
dc.date.issued2009-10-09en_GB
dc.description.abstractDetermining protein sequence similarity is an important task for protein classification and homology detection, which is typically performed using sequence alignment algorithms. Fast and accurate alignment-free kernel based classifiers exist, that treat protein sequences as a “bag of words”. Kernels implicitly map the sequences to a high dimensional feature space, and can be thought of as an inner product between two vectors in that space. This allows an algorithm that can be expressed purely in terms of inner products to be ‘kernelised’, where the algorithm implicitly operates in the kernel’s feature space. A weighted string kernel, where the weighting is derived using probabilistic methods, is implemented using a binary data representation, and the results reported. Alternative forms of data representation, such as Ising and frequency forms, are implemented and the results discussed. These results are then used to inform the development of a variety of novel kernels for protein sequence comparison. Alternative forms of classifier are investigated, such as nearest neighbour, support vector machines, and multiple kernel learning. A kernelized Gaussian classifier is derived and tested, which is informative as it returns a score related to the probability of a sequence belonging to a particular classification. Support vector machines are tested with the introduced kernels, and the results compared to alternate classifiers. As similarity can be thought of as having different components, such as composition and position, multiple kernel learning is investigated with the novel kernels developed here. The results show that a support vector machine, using either single or multiple kernels, is the best classifier for remote protein homology detection out of all the classifiers tested in this thesis.en_GB
dc.description.sponsorshipEPSRCen_GB
dc.identifier.urihttp://hdl.handle.net/10036/97435en_GB
dc.language.isoenen_GB
dc.publisherUniversity of Exeteren_GB
dc.subjectKernel methodsen_GB
dc.subjectProtein homologyen_GB
dc.subjectSupport vecotr machineen_GB
dc.subjectClassificationen_GB
dc.titleKernels for Protein Homology Detectionen_GB
dc.typeThesis or dissertationen_GB
dc.date.available2010-04-27T10:38:35Zen_GB
dc.date.available2011-01-25T17:03:58Zen_GB
dc.date.available2013-03-21T11:14:48Z
dc.contributor.advisorEverson, Richarden_GB
dc.contributor.advisorHoyle, Daviden_GB
dc.publisher.departmentComputer Scienceen_GB
dc.type.degreetitlePhD in Computer Scienceen_GB
dc.type.qualificationlevelDoctoralen_GB
dc.type.qualificationnamePhDen_GB


Files in this item

This item appears in the following Collection(s)

Show simple item record