Show simple item record

dc.contributor.authorStudholme, DJ
dc.contributor.authorRawlings, ND
dc.contributor.authorBarrett, Alan J.
dc.contributor.authorBateman, A
dc.date.accessioned2015-06-12T12:59:13Z
dc.date.issued2003-05-09
dc.description.abstractBACKGROUND: We wished to compare two databases based on sequence similarity: one that aims to be comprehensive in its coverage of known sequences, and one that specialises in a relatively small subset of known sequences. One of the motivations behind this study was quality control. Pfam is a comprehensive collection of alignments and hidden Markov models representing families of proteins and domains. MEROPS is a catalogue and classification of enzymes with proteolytic activity (peptidases or proteases). These secondary databases are used by researchers worldwide, yet their contents are not peer reviewed. Therefore, we hoped that a systematic comparison of the contents of Pfam and MEROPS would highlight missing members and false-positives leading to improvements in quality of both databases. An additional reason for carrying out this study was to explore the extent of consensus in the definition of a protein family. RESULTS: About half (89 out of 174) of the peptidase families in MEROPS overlapped single Pfam families. A further 32 MEROPS families overlapped multiple Pfam families. Where possible, new Pfam families were built to represent most of the MEROPS families that did not overlap Pfam. When comparing the numbers of sequences found in the overlap between a MEROPS family and its corresponding Pfam family, in most cases the overlap was substantial (52 pairs of MEROPS and Pfam families had an intersection size of greater than 75% of the union) but there were some differences in the sets of sequences included in the MEROPS families versus the overlapping Pfam families. CONCLUSIONS: A number of the discrepancies between MEROPS families and their corresponding Pfam families arose from differences in the aims and philosophies of the two databases. Examination of some of the discrepancies highlighted additional members of families, which have subsequently been added in both Pfam and MEROPS. This has led to improvements in the quality of both databases. Overall there was a great deal of consensus between the databases in definitions of a protein family.en_GB
dc.description.sponsorshipWellcome Trusten_GB
dc.description.sponsorshipMRCen_GB
dc.identifier.citationVol. 4, pp. 17en_GB
dc.identifier.doi10.1186/1471-2105-4-17
dc.identifier.urihttp://hdl.handle.net/10871/17516
dc.language.isoenen_GB
dc.publisherBioMed Centralen_GB
dc.relation.urlhttp://www.ncbi.nlm.nih.gov/pubmed/12740029en_GB
dc.relation.urlhttp://www.biomedcentral.com/1471-2105/4/17en_GB
dc.rightsThis is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.en_GB
dc.subjectAnimalsen_GB
dc.subjectBacterial Proteinsen_GB
dc.subjectCaenorhabditis elegans Proteinsen_GB
dc.subjectCysteine Endopeptidasesen_GB
dc.subjectDatabases, Proteinen_GB
dc.subjectDrosophila Proteinsen_GB
dc.subjectMultienzyme Complexesen_GB
dc.subjectMycobacterium tuberculosisen_GB
dc.subjectPeptide Hydrolasesen_GB
dc.subjectProteasome Endopeptidase Complexen_GB
dc.subjectRetroviridaeen_GB
dc.subjectSensitivity and Specificityen_GB
dc.subjectViral Proteinsen_GB
dc.titleA comparison of Pfam and MEROPS: two databases, one comprehensive, and one specialised.en_GB
dc.typeArticleen_GB
dc.date.available2015-06-12T12:59:13Z
dc.identifier.issn1471-2105
exeter.place-of-publicationEngland
dc.descriptionComparative Studyen_GB
dc.descriptionJournal Articleen_GB
dc.descriptionResearch Support, Non-U.S. Gov'ten_GB
dc.descriptionCopyright © 2003 Studholme et al; licensee BioMed Central Ltd.en_GB
dc.identifier.journalBMC Bioinformaticsen_GB


Files in this item

This item appears in the following Collection(s)

Show simple item record