Next-generation sequencing, with its capacity to sequence DNA at unprecedented speed, has been yielding massive amounts of data for biological applications. By 2025, between 100 million and as many as 2 billion human genomes are likely to be sequenced, representing four-to-five orders of magnitude growth in 10 years, according to a recent study published in PLOS Biology journal. The opportunities for the life-science market are tremendous, particularly with respect to precision medicine, providing all that data can be effectively processed and reliably interpreted.
The consumer genomics market also is booming. Companies such as Veritas Genetics, which recently offered individuals the opportunity to have their genomes sequenced and accessible on a smartphone app for $999, will drive the public’s expectations about what their genome can reveal, and what can be done about those revelations.
Of course, having a genome sequenced, in and of itself, means little. For an individual, the information contained therein needs to be interpreted in context: What known susceptibility genes are present? What might activate them or keep them at bay? Although predictive analytics might suggest that, 20 years down the road, a particular individual will experience early symptoms of dementia, for example, the reality is that the prediction may not come true. Lifestyle changes, medications not yet on the market or other treatments not yet imagined may intervene. When it comes to interpreting data in ways that can significantly impact a person’s life, the operative word is “caution.”
In early stage drug discovery R&D, it’s also important not to jump to conclusions based solely on genomic data. The presence or absence of certain genomic variants, such as single nucleotide polymorphisms (SNPs), may or may not be relevant to an individual’s response to a particular compound. This information, too, must be viewed in context—specifically, in context with what is already known about the gene or variant in the peer-reviewed literature, clinical trial data, EMR systems, regulatory data, mobile and diagnostic monitoring and other sources—in order to inform decision making.
Data to knowledge
Collaboration is critical for gaining context and, ultimately, for maximizing the potential of genomic data to inform both the general public and life-science researchers. Unless that data is responsibly shared, companies will end up with small, private collections from which it won’t be possible to draw meaningful conclusions.
DNAdigest, a Cambridge, U.K.-based nonprofit that aims to facilitate sharing of genomics data, has compiled a list of close to two dozen repositories, such as the European Nucleotide Archive and the DNA databank of Japan, where researchers can download or upload genomic data, as well as a list of downloadable genomic data collections, such as the Cancer Genome Atlas and 1000 Genomes.
The nonprofit also is building an online platform, Repositive, that indexes human genomic data stored in repositories and provides researchers with an easy-to-use interface to access that data at no charge. For these kinds of initiatives to succeed and benefit the research community, data standards or, at a minimum, common ways of sharing data must be agreed upon. This will cut down on the amount of work involved in harmonizing data before it can be processed and queried.
Knowledge to understanding
While it’s important to have access to as much genomic data as possible in usable formats, gleaning clinically relevant insights from the data will require additional collaborations. Science and technology companies must work together to facilitate efforts to characterize and compare data during early-stage drug discovery, and to make sense of input coming in on the consumer side.
Several open-source initiatives are already engaged in such collaborations. For example, the New York Genome Center and IBM are collaborating to create an open repository of genetic data to accelerate cancer research and, ultimately, to use insights from IBM Watson to inform personalized treatment decisions. In the first phase of the project, the two organizations will examine genetic information from 200 cancer patients. Sequencing and clinical data will be fed into Watson for analysis, with the goal of identifying existing drugs that might target an individual patient’s specific cancer-causing mutations. Any clinically relevant insights would be shared with each patient’s physician to potentially support treatment decisions.
On a broader scale, the U.S. government’s Precision Medicine Initiative has held a summit during which more than 40 private-sector organizations and various federal agencies committed to participating in collaborative efforts to move genomic-based medicine forward. One such effort is the PrecisionFDA Consistency Challenge, which calls on the genomics community to “further assess, compare, and improve techniques used in DNA testing,” and advance quality standards.
Another example of science/technology collaboration is the Precision Medicine Initiative Cohort, Vanderbilt University will collaborate with Verily (formerly Google Life Sciences) to launch the first phase of the program, which is defined as a “participant-engaged, data-driven enterprise supporting research at the intersection of human biology, behavior, genetics, environment, data science and computation, and much more to produce new knowledge with the goal of developing more effective ways to prolong health and treat disease.”
Against that backdrop, in addition to collaborating with the larger community, pharmaceutical companies need to be implementing tools that can derive insights for their own drug discovery and development programs. As noted earlier, this requires the ability to harmonize and make sense of data from disparate sources, not just from the literature.
Genomic data—which is just one type of data input—is itself immensely vast. A key resource is the dbSNP database, which currently has more than 200 million known human SNPs.
While many SNPs may have no biological impact, and others may simply provide the basis for benign differences between individuals, many of these variations in human genomic sequences lead to medically relevant phenotypes. These nucleotide changes may cause disease or confer greater susceptibility to certain medical conditions. Identifying these variants and understanding their impact allows for better comprehension of disease mechanisms and prediction of disease development as well as identification of optimum medications and therapies for individuals.
Currently available technology enables R&D departments to analyze these variants according to factors such as their location on a chromosome; location within the gene itself (e.g., in the coding region, the intron, the regulatory region for expression); whether the SNP affects a protein’s sequence and, consequently, its function and activity; what diseases or cell processes are associated with the gene containing the SNP of interest; and whether there is any known clinical impact reported for a specific SNP.
Access to this type of knowledge, which is continually evolving, will not only drive R&D forward—in collaborations with companies working directly with consumers, for example, it can help inform individuals about new research relevant to a portion of their own genetic sequence.
Looking Ahead
Many life-science companies today are understandably interested in the power of genomic data to facilitate the development of targeted therapeutics and to identify subpopulations whose genes suggest they may be responders. But understanding genomic data opens the door to so much more. It forms the foundation of what Leroy Hood, president & cofounder of the Institute for Systems Biology refers to as P4 Medicine (personalized, predictive, preventive and participatory). Life-science companies have roles to play in each of these areas—personalizing treatments, predicting response to medications and vulnerability to disease, helping to prevent disease progression and encouraging patients to provide the genomic input necessary for R&D to progress.
These potential achievements are not without potential drawbacks. What, for example, are the consequences of telling a 35-year-old that he has a variant combination that typically triggers multiple sclerosis at age 45? Given the lack of preventive treatments, are there mitigating factors that might affect this prognosis?
Responsible oversight will be needed to ensure that the tremendous insights we glean from genomic data are appropriately communicated to individuals, payers and governmental entities. In fact, the ethical aspects of all facets of the process—gathering the data (informed consent), handling the data, and returning results—also must be addressed. These are important issues with no easy answers. All stakeholders—again, collaboratively—will need to take part in the discussions so that society can derive the most benefit from our current efforts and achievements.