The Cancer Genome Atlas (TCGA) is a comprehensive and organized effort to speed up the world’s understanding of cancer by using large-scale genome sequencing and bioinformatics to catalog genetic mutations responsible for cancer.
There are at least 200 forms of cancer and many more subtypes. Cancer is caused by an accumulation of DNA errors, or mutations, that allows cells to proliferate in an uncontrolled manner. Each cancer subtype has its own unique signature of DNA mutations in its genome; identifying these mutations and understanding how they interact to drive the disease is the foundation for improving cancer prevention, early detection and treatment.
TCGA’s finalized tissue collection contains matched tumor and normal tissues from 11,000 patients, and allows for the comprehensive characterization of 33 cancer types and subtypes, including 10 rare cancers. The comprehensive data that have been generated by TCGA’s network approach are freely available and widely used by the cancer community through the TCGA Data Portal and the Cancer Genomics Hub (CGHub).
In 2012, Cycle Computing and a multinational biotechnology company partnered to leverage cloud computing to analyze TCGA data in a unique way. The firm had developed a new end-to-end solution to identify DNA mutations in the TCGA data that could act as markers and risk factors in cancer samples. This solution included the typical SNP and DNA variation workflow, as well as a custom gene fusion, chromosome aberration discovery pipeline.
Performing the research on their internal servers would have required decades of computation. However, Cycle Computing’s CycleCloud software gave the biotechnology firm complete flexibility and control over the workflows they wanted to run in the cloud. This enabled them to easily deploy their TCGA analysis workflow in an 8,000+ core cluster, running across 2,000+ instances in Amazon Web Services (AWS).
The impact on the research was profound and immediate: 15.6 computing years accomplished in six months; 4,178 tumor samples, 19 cancer types, and 32.2 terabytes of data processed.
TCGA is a joint effort of the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI), two of the 27 Institutes and Centers of the National Institutes of Health, U.S. Department of Health and Human Services.
Rob Futrick is CTO of Cycle Computing.
R&D 100 AWARD ENTRIES NOW OPEN: Establish your company as a technology leader! For more than 50 years, the R&D 100 Awards have showcased new products of technological significance. You can join this exclusive community! Learn more.