The human genome contains an estimated 20,000 genes coding for proteins. The proteins are the body’s “workers”, tasked with performing specific functions that are key to survival. Despite their importance, there is a type of very small proteins, with less than 100 amino acids, that are essential to understanding how living things work, and about which we know virtually nothing, since merely identifying them is a veritable technological challenge.
Now, however, investigators from the Centre for Genomic Regulation (CRG) of Barcelona, led by the ICREA Research Professor Luis Serrano, head of the Design of Biological Systems group, have developed a technique that can predict and classify these proteins based on a new bioinformatics tool into which they fed multiple -omic data. This enabled them to discover that these small proteins account for at least 16% of the bacterial genome. The findings of their work have been published in the journal Molecular Systems Biology.
“We studied the Mycoplasma pneumoniae bacteria and discovered that we could be overlooking up to 10 out of every 100 of the proteins coded in their reduced genome simply because they are so small”, said María Lluch-Senar, staff scientist of the CRG and the study’s principal investigator. “This percentage could be highly significant in the case of more complex or human organisms”, she added.
Recent studies have shed some light on the importance of these small proteins, such as the antimicrobial peptides secreted by insects, animals, plants and even human beings in response to infection. These small proteins have also been shown to communicate with other bacteria in the environment and also with the host, such as our organism. In fact, they may play a very important role in having a balanced microbiota.
“The interest of our study lay in ascertaining the number and variety of functions that these hitherto disregarded proteins could present”, explained Samuel Miravet-Verde, a PhD student at the CRG and the lead author of the work.
Hitherto, when a genome was annotated, only DNA segments which following transcription and translation could yield proteins with more than 100 amino acids were taken into account. Anything below this number was disregarded because of the technological challenge involved, since the usual approaches used to identify proteins are not possible precisely because they are so small. This is further complicated by the fact that these proteins tend to have a very short life, they are not abundant or they even present tissue- and time-specific expression patterns that render them even more difficult to detect.
Moreover, comparative conservation studies are normally performed in order to be able to assign functions to proteins, in which different organisms are taken and an attempt is made to ascertain the extent of their presence, compare the length of them both and define whether the similarity between them is or is not significant. As these small proteins cannot be identified, this approach cannot be employed to make comparisons between organisms, whereby their role remains a mystery.
In this research, the investigators conducted a preliminary study in 109 bacterial genomes in which they tried to classify or assign functions to these proteins. To this end, they applied algorithms already used in other settings, into which they input parameters related to the nature of a protein. They subsequently validated their findings by using proteins already identified in other bacterial species.
The technique they developed is universal and may be applied to different bacterial species.