Coverage of protein sequence space by current structural genomics targets

Nicholas O'Toole, S. Raymond, M. Cygler

Research output: Contribution to journalArticlepeer-review

10 Citations (Scopus)


By its purest definition the ultimate goal of structural genomics (SG) is the determination of the structures of all proteins encoded by genomes. Most of these will be obtained by homology modeling using the structures of a set of target proteins for experimental determination. Thanks to the open exchange of SG target information, we are able to analyze the sequences of the current target list to evaluate the extent of its coverage of protein sequence space. The presence of homologous sequences currently either in the Protein Data Bank (PDB) or among SG targets has been determined for each of the protein sequences in several organisms. In this way we are able to evaluate the coverage by existing or targeted structural data for the non-membranous parts of entire proteomes. For small bacterial proteomes such as that of H. influenzae almost all proteins have homologous sequences among SG targets or in the PDB. There is significantly lower coverage for more complex organisms, such as C. elegans. We have mapped the SG target list onto the ProtoMap clustering of protein sequences. Clusters occupied by SG targets represent over 150,000 protein sequences, which is approximately 44% of the total protein sequences classified by ProtoMap. The mapping of SG targets also enables an evaluation of the degree of overlap within the target list. An SG target typically occupies a ProtoMap cluster with more than six other homologous targets.
Original languageEnglish
Pages (from-to)47-55
JournalJournal of Structural and Functional Genomics
Issue number2-3
Publication statusPublished - 2003


Dive into the research topics of 'Coverage of protein sequence space by current structural genomics targets'. Together they form a unique fingerprint.

Cite this