Affiliations: The European Bioinformatics Institute, Wellcome Trust
Genome Campus, Hinxton, Cambridge, CB10 1SD, UK | The University of Southampton, School of Electronics
and Computer Science, University Road, Southampton, SO17 1BJ, UK
Abstract: Distant evolutionary relationships between proteins with low
sequence similarity are difficult to recognise by computational methods.
Consequently, many sequences obtained from large-scale sequencing projects
cannot be assigned to any known proteins or families despite being
evolutionarily related. To boost sensitivity, various sequence-based methods
have been modified to make use of the better conserved secondary structure.
Most of these methods are instance-based or generative. Here, we introduce a
kernel-based remote homology detection method that allows for a combination of
sequence and secondary-structure similarity scores in a discriminative
approach. We studied the ability of the method to predict superfamily
membership as defined by the SCOP database. We show that a kernel method that
combined sequence similarity scores with predicted secondary-structure
similarity scores performed similar to a classifier that used scores calculated
from sequences and true secondary structures, but performed better than a
sequence-only based classifier and achieved a better mean than recently
published results on the same data-set. It can be concluded that SVM classifiers trained to predict homology
between distantly related proteins, become more accurate, if a joint
sequence/secondary-structure similarity score approach is used.
Keywords: Remote homology detection, support vector machines, secondary structures