Resource for benchmarking the applicability of protein structure models

Carbajo Pedrosa, Daniel

The function of a protein is closely related to the structure it attains. The sequence of a protein is of limited biological relevance without some knowledge of both its structure and its function; protein structures provide a wealth of information that cannot be deduced from their primary sequence alone; therefore, we can get a complete understanding of protein roles by analyzing them in structural terms. Structure-based methodologies are consequently regarded as more robust than sequence-based ones. The limiting step for these structure-based methodologies is actually having the structure of a protein at hand. Due to the ever-increasing gap between known protein sequences and structures and the ever-growing number of protein structure prediction methods available, which are becoming more and more accurate over time, the use of protein structure models is mandatory. However, and in spite of progress in the field of protein structure prediction, computed models often contain structural inaccuracies in both backbone and side-chain spatial coordinates; instead of being discarded, these models can provide important insights into the function of the native counterpart; this, in turn, demands the existence of robust methods that can effectively make use of computed models in the midrange and low range of accuracy, routinely produced by proteome-scale protein structure modeling projects. Any structure-based algorithm that does not require high-resolution structures will prove to have a big advantage and an inestimable practical value. ModelDB, the tool introduced here, strives to serve as a resource to test any structure-based method (such as an active site or ligand-binding site predictor) on protein structure models of different quality. This has the final goal of benchmarking the applicability of protein structure models for a given novel algorithm. ModelDB builds sets of models of decreasing quality, which we call decoys, given the sequences experimentally determined proteins. A decoy is a computergenerated protein structure that possesses some characteristics of native proteins, but is not biologically real. Our system is implemented in such a way that any structure-based existing method can be tested on the real structure and on the decoy models. The next step is to automatically assess at which level of quality the results of the tested method differ from those obtained with the native structure. Each decoy model is directly compared to its corresponding native structure and precise quality scores are computed. For a visual insight on how models of different qualities look like and differ from the native counterpart in a spatial context, they are "colored" following different colorschemes defined by the following spatial descriptors: Solvent accessibilities, secondary structures, cavity occurrences, average depths, protrusion indexes or burial indexes. This, in turn, allows an easy visualization and understanding of these parameters' variations in the protein structural context. Besides, functional annotation is provided when available, in terms of catalytic sites, ligand-binding sites and other sites of relevance like glycosylation sites. The tool is publicly available either as an on-line tool or a local application for larger calculations; it makes use of other in-house tools that also exists independently on-line and for local use. One of these tools, mappON, colors input structures according to diverse descriptors and outputs a table with the descriptors of selected residues (and those surrounding them); thus, it serves to analyze properties of key residues in the protein structural context and visually examine the results. The other, MAP, has some features intended to deal with the common problem in bioinformatics of mapping sequence residues onto structures, or structure residues onto another structure. Very few other public resources exist for readily retrieving decoy sets of protein structures, and we indeed have no record of any other automated pipeline for producing such decoys in an easy and user-friendly fashion. Our tool, apart from allowing to build new decoy sets for a given protein a scientist is interested in, covers many more different proteins representing a bigger portion of the protein structural space than any other resource. Furthermore, the on-line version has the advantage to let the user visually inspect and compare all the models of ranging quality for a given protein in the same spatial frame. The decoy sets are conceived to test structure-based methods and define to which extent they can make use of predicted protein structure models. However, the functional documentation, the model quality estimates and the different color schemes allow many large-scale analyses to be performed as well.

Resource for benchmarking the applicability of protein structure models / CARBAJO PEDROSA, Daniel. - (2012 Feb 29).