A novel approach to remote homology detection: jumping alignments

Spang R, Rehmsmeier M, Stoye J (2002)
Journal of Computational Biology 9(5): 747-760.

Zeitschriftenaufsatz | Veröffentlicht | Englisch
 
Download
OA
Autor*in
Abstract / Bemerkung
We describe a new algorithm for protein classification and the detection of remote homologs. The rationale is to exploit both vertical and horizontal information of a multiple alignment in a well-balanced manner. This is in contrast to established methods such as profiles and profile hidden Markov models which focus on vertical information as they model the columns of the alignment independently and to family pairwise search which focuses on horizontal information as it treats given sequences separately. In our setting, we want to select from a given database of "candidate sequences" those proteins that belong to a given superfamily. In order to do so, each candidate sequence is separately tested against a multiple alignment of the known members of the superfamily by means of a new jumping alignment algorithm. This algorithm is an extension of the Smith-Waterman algorithm and computes a local alignment of a single sequence and a multiple alignment. In contrast to traditional methods, however, this alignment is not based on a summary of the individual columns of the multiple alignment. Rather, the candidate sequence is at each position aligned to one sequence of the multiple alignment, called the "reference sequence". In addition, the reference sequence may change within the alignment, while each such jump is penalized. To evaluate the discriminative quality of the jumping alignment algorithm, we compare it to profiles, profile hidden Markov models, and family pairwise search on a subset of the SCOP database of protein domains. The discriminative quality is assessed by median false positive counts (med-FP-counts). For moderate med-FP-counts, the number of successful searches with our method is considerably higher than with the competing methods.
Stichworte
Homology detection; Jumping alignments; Protein classification; Sequence analysis
Erscheinungsjahr
2002
Zeitschriftentitel
Journal of Computational Biology
Band
9
Ausgabe
5
Seite(n)
747-760
ISSN
1066-5277
eISSN
1557-8666
Page URI
https://pub.uni-bielefeld.de/record/1773578

Zitieren

Spang R, Rehmsmeier M, Stoye J. A novel approach to remote homology detection: jumping alignments. Journal of Computational Biology. 2002;9(5):747-760.
Spang, R., Rehmsmeier, M., & Stoye, J. (2002). A novel approach to remote homology detection: jumping alignments. Journal of Computational Biology, 9(5), 747-760. https://doi.org/10.1089/106652702761034172
Spang, Rainer, Rehmsmeier, Marc, and Stoye, Jens. 2002. “A novel approach to remote homology detection: jumping alignments”. Journal of Computational Biology 9 (5): 747-760.
Spang, R., Rehmsmeier, M., and Stoye, J. (2002). A novel approach to remote homology detection: jumping alignments. Journal of Computational Biology 9, 747-760.
Spang, R., Rehmsmeier, M., & Stoye, J., 2002. A novel approach to remote homology detection: jumping alignments. Journal of Computational Biology, 9(5), p 747-760.
R. Spang, M. Rehmsmeier, and J. Stoye, “A novel approach to remote homology detection: jumping alignments”, Journal of Computational Biology, vol. 9, 2002, pp. 747-760.
Spang, R., Rehmsmeier, M., Stoye, J.: A novel approach to remote homology detection: jumping alignments. Journal of Computational Biology. 9, 747-760 (2002).
Spang, Rainer, Rehmsmeier, Marc, and Stoye, Jens. “A novel approach to remote homology detection: jumping alignments”. Journal of Computational Biology 9.5 (2002): 747-760.
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Copyright Statement:
Dieses Objekt ist durch das Urheberrecht und/oder verwandte Schutzrechte geschützt. [...]
Volltext(e)
Access Level
OA Open Access
Zuletzt Hochgeladen
2019-09-06T08:48:08Z
MD5 Prüfsumme
d74c8f1ce5f4359744115523b4715a1d


14 Zitationen in Europe PMC

Daten bereitgestellt von Europe PubMed Central.

Decoding noises in HIV computational genotyping.
Jia M, Shaw T, Zhang X, Liu D, Shen Y, Ezeamama AE, Yang C, Zhang M., Virology 511(), 2017
PMID: 28918303
Pareto optimization in algebraic dynamic programming.
Saule C, Giegerich R., Algorithms Mol Biol 10(), 2015
PMID: 26150892
Probabilistic inference of viral quasispecies subject to recombination.
Töpfer A, Zagordi O, Prabhakaran S, Roth V, Halperin E, Beerenwinkel N., J Comput Biol 20(2), 2013
PMID: 23383997
Haploid to diploid alignment for variation calling assessment.
Mäkinen V, Rahkola J., BMC Bioinformatics 14 Suppl 15(), 2013
PMID: 24564537
TCRep 3D: an automated in silico approach to study the structural properties of TCR repertoires.
Leimgruber A, Ferber M, Irving M, Hussain-Kahn H, Wieckowski S, Derré L, Rufer N, Zoete V, Michielin O., PLoS One 6(10), 2011
PMID: 22053188
HIV classification using the coalescent theory.
Bulla I, Schultz AK, Schreiber F, Zhang M, Leitner T, Korber B, Morgenstern B, Stanke M., Bioinformatics 26(11), 2010
PMID: 20400454
Homology and phylogeny and their automated inference.
Fuellen G., Naturwissenschaften 95(6), 2008
PMID: 18288471
Recco: recombination analysis using cost optimization.
Maydt J, Lengauer T., Bioinformatics 22(9), 2006
PMID: 16488909
A jumping profile Hidden Markov Model and applications to recombination sites in HIV and HCV genomes.
Schultz AK, Zhang M, Leitner T, Kuiken C, Korber B, Morgenstern B, Stanke M., BMC Bioinformatics 7(), 2006
PMID: 16716226
jpHMM at GOBICS: a web server to detect genomic recombinations in HIV-1.
Zhang M, Schultz AK, Calef C, Kuiken C, Leitner T, Korber B, Morgenstern B, Stanke M., Nucleic Acids Res 34(web server issue), 2006
PMID: 16845050
A sequence sub-sampling algorithm increases the power to detect distant homologues.
Johnston CR, Shields DC., Nucleic Acids Res 33(12), 2005
PMID: 16006623
A robust method to detect structural and functional remote homologues.
Shachar O, Linial M., Proteins 57(3), 2004
PMID: 15382232

29 References

Daten bereitgestellt von Europe PubMed Central.

Basic local alignment search tool.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ., J. Mol. Biol. 215(3), 1990
PMID: 2231712

AUTHOR UNKNOWN, 0
Hidden Markov models of biological primary sequence information.
Baldi P, Chauvin Y, Hunkapiller T, McClure MA., Proc. Natl. Acad. Sci. U.S.A. 91(3), 1994
PMID: 8302831
Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships.
Brenner SE, Chothia C, Hubbard TJ., Proc. Natl. Acad. Sci. U.S.A. 95(11), 1998
PMID: 9600919
A flexible motif search technique based on generalized profiles.
Bucher P, Karplus K, Moeri N, Hofmann K., Comput. Chem. 20(1), 1996
PMID: 8867839
Maximum discrimination hidden Markov models of sequence consensus.
Eddy SR, Mitchison G, Durbin R., J. Comput. Biol. 2(1), 1995
PMID: 7497123
Gene recognition via spliced sequence alignment.
Gelfand MS, Mironov AA, Pevzner PA., Proc. Natl. Acad. Sci. U.S.A. 93(17), 1996
PMID: 8799154
An improved algorithm for matching biological sequences.
Gotoh O., J. Mol. Biol. 162(3), 1982
PMID: 7166760
Profile analysis: detection of distantly related proteins.
Gribskov M, McLachlan AD, Eisenberg D., Proc. Natl. Acad. Sci. U.S.A. 84(13), 1987
PMID: 3474607
Homology detection via family pairwise search.
Grundy WN., J. Comput. Biol. 5(3), 1998
PMID: 9773344
A space-efficient algorithm for local similarities.
Huang XQ, Hardison RC, Miller W., Comput. Appl. Biosci. 6(4), 1990
PMID: 2257499
A discriminative framework for detecting remote protein homologies.
Jaakkola T, Diekhans M, Haussler D., J. Comput. Biol. 7(1-2), 2000
PMID: 10890390

Komatsoulis, Ann. Int. Conf. Computational Molecular Biology (RECOMB 97) (), 1997
Hidden Markov models in computational biology. Applications to protein modeling.
Krogh A, Brown M, Mian IS, Sjolander K, Haussler D., J. Mol. Biol. 235(5), 1994
PMID: 8107089

AUTHOR UNKNOWN, 0
SCOP: a structural classification of proteins database.
Lo Conte L, Ailey B, Hubbard TJ, Brenner SE, Murzin AG, Chothia C., Nucleic Acids Res. 28(1), 2000
PMID: 10592240
Improving the sensitivity of the sequence profile method.
Luthy R, Xenarios I, Bucher P., Protein Sci. 3(1), 1994
PMID: 7511453
DIALIGN: finding local similarities by multiple sequence alignment.
Morgenstern B, Frech K, Dress A, Werner T., Bioinformatics 14(3), 1998
PMID: 9614273
SCOP: a structural classification of proteins database for the investigation of sequences and structures.
Murzin AG, Brenner SE, Hubbard T, Chothia C., J. Mol. Biol. 247(4), 1995
PMID: 7723011
Intermediate sequences increase the detection of homology between sequences.
Park J, Teichmann SA, Hubbard T, Chothia C., J. Mol. Biol. 273(1), 1997
PMID: 9367767
Phylogenetic information improves homology detection.
Rehmsmeier M, Vingron M., Proteins 45(4), 2001
PMID: 11746684
Identification of common molecular subsequences.
Smith TF, Waterman MS., J. Mol. Biol. 147(1), 1981
PMID: 7265238

Spang, Int. Conf. Intelligent Systems for Molecular Biology (ISMB 00) (), 2000
Statistics of large-scale sequence searching.
Spang R, Vingron M., Bioinformatics 14(3), 1998
PMID: 9614271

AUTHOR UNKNOWN, 0
Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Web of Science

Dieser Datensatz im Web of Science®
Quellen

PMID: 12487762
PubMed | Europe PMC

Suchen in

Google Scholar