Wednesday, December 11, 2013

Application and implementation of probabilistic profile-profile comparison methods for protein fold recognition

Fold recognition is a method of fold detecting and protein tertiary structure prediction applied for proteins lacking homologues sequences of known fold and structure deposited in the Protein Data Bank. They are based on assumption that there is strictly limited number of different protein folds in nature, mostly as a result of evolution and due to basic physical and chemical constraints of polypeptide chains. Fold recognition methods are useful for protein structure prediction, evolutionary analysis, metabolic pathways and enzymatic efficiency prediction, molecular docking and drug design. Currently there are about 1300 discovered and characterized protein folds in SCOP and CATH databases. Every newly discovered protein sequence has significant chances to be classified into one of those folds. Many different approaches have been proposed for finding the correct fold for a new sequence and it is often useful to include evolutionary information for query as well as for target proteins. One of the methods of including this information is a comparison of a query and target sequences profiles. These fold recognition techniques are called profile-profile methods. Profile-profile alignments can be calculated using a dot-product, a probabilistic model, stochastic or theoretical measures. Here are presented applications and implementations of probabilistic profile-profile comparison methods and advantages of usage of probabilistic scoring function over comparable fold recognition techniques. The purpose of this comparison is to show that probabilistic profile-profile methods may outperform other fold recognition methods in comparison in analysis of distantly related proteins and that they can be applied not only for fold recognition but also for slightly different purposes like gene identification, detection of domain boundaries and modeling of complex proteins.