Structural genomics is the wide term which describes process of determination of structure representation of information in human genome and at present is limited almost exclusively on proteins. Although in common understanding genetic information means “genes and their encoded protein products”, thousands of human genes produce transcripts which are important in biological point of view but they do not necessarily produce proteins. Furthermore, even though the sequence of the human DNA is known by now, the meaning of the most of the sequences still remains unknown. It is very likely that a large amount of genes has been highly underestimated, mainly because the actual gene finders only work well for large, highly expressed, evolutionary conserved protein-coding genes. Most of those genome elements encode for RNA from which transfer and ribosomal RNAs are the classical examples. But beside these well-known molecules there is a vast unknown world of tiny RNAs that might play a crucial role in a number of cellular processes. Those elements are named Noncoding RNAs (ncRNA) and they perform their function without transcription to the protein product.
Here is proposed development of integrated bioinformatics platform that is specifically addressed for detecting, verifying, and classifying of noncoding RNAs. This complex approach to "Computational RNomics" will provide the pipeline which will be capable of detecting RNA motifs with low sequence conservation. It will also integrate RNA motif prediction which should significantly improve the quality of the RNA homolog search.
Monday, October 18, 2010
Friday, January 1, 2010
Linear motifs are short segments of multidomain proteins that provide regulatory functions independently of protein tertiary structure. Much of intracellular signalling passes through protein modifications at linear motifs. Many thousands of linear motif instances, most notably phosphorylation sites, have now been reported. Although clearly very abundant, linear motifs are difficult to predict de novo in protein sequences due to the difficulty of obtaining robust statistical assessments. The ELM resource at http://elm.eu.org/ provides an expanding knowledge base, currently covering 146 known motifs, with annotation that includes >1300 experimentally reported instances. ELM is also an exploratory tool for suggesting new candidates of known linear motifs in proteins of interest. Information about protein domains, protein structure and native disorder, cellular and taxonomic contexts is used to reduce or deprecate false positive matches. Results are graphically displayed in a 'Bar Code' format, which also displays known instances from homologous proteins through a novel 'Instance Mapper' protocol based on PHI-BLAST. ELM server output provides links to the ELM annotation as well as to a number of remote resources. Using the links, researchers can explore the motifs, proteins, complex structures and associated literature to evaluate whether candidate motifs might be worth experimental investigation.