Today the international ENCODE (Encyclopedia Of DNA Elements) Project presented an overview of their ongoing, large-scale efforts to interpret the human genome sequence. Published in the journal PLoS Biology, “A User’s Guide to the Encyclopedia of DNA Elements (ENCODE)” serves as a guide to help scientists interpret the vast array of data and resources produced by the project. All of the data, tools to study them and the paper itself are freely available through the European Molecular Biology Laboratory’s European Bioinformatics Institute (EMBL-EBI) in the UK.
The Human Genome Project and subsequent large-scale genomic efforts have been carried out with the belief that the data they produce should be made freely available to the scientific community to facilitate discovery. The ENCODE database (genome.ucsc.edu/ENCODE) makes its genomic data and related information both available and accessible, offering web-based tools (encodeproject.org) that make it easier for researchers to use the data.
The User’s Guide shows how the data can be immediately useful in interpreting associations between single nucleotides and disease. For example, DNA variants upstream of the c-Myc proto-oncogene are known to be associated with multiple cancers, but until recently the mechanism behind this association had not been determined. ENCODE data show that the variants can change the binding of transcription factor proteins to an enhancer region, which leads to changes in expression of the c-Myc gene and therefore to the onset of cancer. Similar studies are now possible for the thousands of variants identified in genome-wide association studies. This will go a long way in addressing mechanistic questions of susceptibility to a wide range of human diseases.
Ewan Birney, Senior Team Leader at EMBL-EBI, commented “We knew four years ago, from our publication of ENCODE techniques on 1% of the genome, that we had an unprecedented view of how biology works on those regions. By extending our work to the entire genome, we see the immediate impact on the interpretation of noncoding variants identified in genome-wide association studies. These studies are disease-driven but have not always yielded clear next steps; ENCODE data can open up new paths to follow.”
Scientists with the ENCODE Project are applying up to 20 different tests in 108 commonly used cell lines to compile these important data. “Assays that are now fundamental to biology, such as chromatin immunoprecipitation and sequencing or ChIP-seq, were produced by the ENCODE Project,” commented John Stamatoyannopoulos of the University of Washington School of Medicine in the US. “Widely used computational tools for processing and interpretation of large-scale functional genomic data have also been developed by the project. The depth, quality, and diversity of the ENCODE data are unprecedented.”