We may well be on the right path, but the fine tuning could surprise us
In upcoming columns, I may expand upon each of the three topics above. However, for now, let’s begin by talking about how statistical programming, genomic analysis and biotechnology’s future are interconnected. Each is intimately connected to the usual rants that periodically appear in this column. However, several recent occurrences have triggered this reevaluation of the supposed state-of the-art.
Freeman Dyson, physicist and futurist, recently authored a refreshingly broad approach to biotechnology in the New York Review of Books.1 As scientists, we tend to
take a mechanistic, reductionist approach to problems, reducing broad biological systems to genes, proteins and small molecules. We now apply sophisticated mathematics to this examination and, finally — recognizing that we might have lost too much information in the process — we then overlay more complexity with software, purporting to reconnect the lost associations. From a drug discovery standpoint, this has definitely not worked. The reasons for this are several to many, and are the subject of this month’s gripe.
In his very readable and seemingly prescient article, Dyson states that, whereas twentieth century science was dominated by physics, the twenty first will be dominated by biology. He goes on to say that biology is not only bigger than physics (budget, workforce, major discoveries) but now also more important, as measured by its economic consequences and impact on human welfare. He goes on to talk about new biological technologies in terms of engineering new organisms and the resultant technologies (as opposed to “old-fashioned” gene insertion). A central theme of this discussion is that, to accomplish any major strides, we need to begin viewing organisms (including man) as patterns of organization rather than collections of genes and molecules. It appears that this is Dyson’s present view of “getting the big picture.”
To attain the grand vision that he lays out will take quite a few very large strides and, to do this, we need to adjust our thinking somewhat. Again, we can examine the problem from a drug development standpoint. Each week, there crosses our desks a plethora of papers proffering a new procedure (statistical or data mining) for discovering and refining a gene list that will either
1. give us a more accurate or relevant list or
2. illuminate a previously unknown pathway, protein, or disease connection.
I’m now approximately 400 papers behind on these as they entail some “interesting” mathematics but very little connection to the real world. The basic flow of their story goes: critique of what’s already being used How can we define a (the?) minimal gene set that is necessary and sufficient to discover the minimal number of pathways that must be disrupted, allow for feedback effects many pathways removed, and do no damage? It seems of late that we cannot even define a gene as a definite nucleotide sequence with exact starts and stops, nor do we know how to incorporate events such as mutation and aneuploidy. What’s the problem? Could it be the way we are basically viewing the processes? We may be correct in attempting to connect all of this database software, but the fine tuning may be a bit different than we think.
1. Freeman Dyson. Our Biotech Future. The New York Review of Books, vol. 54, n. 12. July 19, 2007.
John Wass is a statistician based in Chicago, IL. He may be reached at editor@ScientificComputing.com.