Distributed Virtual High-Throughput Screening
A boon for drug discovery
John H. Begemann, Ph.D.
Pharmaceutical discovery companies encounter numerous roadblocks between concept and commercialization. Chemistry-based drug firms, in particular, process large numbers of compounds in their search for new drugs. Whether chemical libraries are acquired through synthesis, purchase, license or corporate merger, their management and disposition are time- and labor-intensive.
At various times within the last 15 years, compound acquisition, library-related data management and compound testing have been identified in drug discovery. Although these operations have been addressed more or less successfully by software and automation vendors, paradoxically, approvals of New Chemical Entities (NCEs) continue to fall. The U.S. Food and Drug Administration (FDA) approved 33 new drugs in 1999, 26 in 2000, 24 in 2001 and just 17 in 2002. Despite FDA’s efforts to streamline drug approval, companies now spend about 10 years and $800 million to launch a new pharmaceutical.
![]() |
Extremely rapid virtual screening of combinatorial libraries eliminates unpromising compounds before they are synthesized. |
The low NCE rate is partly due to corporate mergers and the FDA’s reluctance to approve “me-too” drugs that offer little or no improvement over existing products. Although 20 new drugs were approved in 2003 (as of November 30), the downward trend in NCEs combined with long development times and high costs spells trouble for drug developers.
Virtual high-throughput screening (vHTS), which reproduces chemical synthesis and compound screening in silico, could help to reverse the pipeline drug drain by significantly reducing the number of new structures chemists need to create while simultaneously improving the quality of selected structures.
Drug testing or screening requires drug-like compounds and assays to assess the compounds’ effectiveness. Since vHTS is performed on a computer rather than in a test tube, chemists need not synthesize compounds until a “hit” — a promising compound that tests positive in the virtual screen — is uncovered.
vHTS begins with digital representations of the chemical, structural, electronic and spatial attributes of putative drugs and their biological targets (usually proteins but almost always large biomolecules). Scientists then look for virtual molecules that fit the virtual target’s active site. Virtual screening is actually more complex than the apocryphal “lock and key” idea since, in addition to having the right size and shape, small molecules and their targets must also favorably interact chemically and electronically.
![]() |
Figure 1: A set of three interactive pharmacophore constraints shown in the active site of carbonic anhydrase (1AZM). The white spherical constraint is an essential metal interaction at the zinc ion. The white disc-like constraint, associated with the backbone nitrogen of THR 199, defines an optional hydrogen donor interaction. The green constraint represents an optional hydrogen acceptor interaction that is being defined on THR 199. |
vHTS programs can screen tens-of-thousands of virtual molecules in a fraction of the time that it would take to synthesize and test molecules in a lab. However, molecular modeling is CPU-intensive and, by comparison to everyday computer tasks, time-consuming.
As an example, consider a core molecule in which any of 100 different substituents may be inserted at three positions, R1, R2 and R3. These three substitutions represent 1 million (1003) compounds. Due to rotations about chemical bonds, some of the larger R groups also may assume multiple spatial orientations, or poses, which multiplies the data set accordingly.
Although physical million-compound libraries are quite common in large pharma, synthesizing or acquiring a library of that size from scratch would require an army of chemists working day and night for many years. vHTS programs “synthesize” a virtual chemical library of almost any size, from a core structure and specified substitution patterns, in minutes. Virtual screening, however, is significantly more complex and has only been possible on a large scale with the advent of inexpensive computing hardware.
Methodologies for virtual screening
Tripos, a computational chemistry software development company, offers a line of virtual screening tools in its SYBYL suite of computational chemistry products. SYBYL’s virtual screening modules work together or individually to solve practical medicinal chemistry and pharmacology problems. For example, the FlexX module uses incremental construction to dock or build ligands into target binding sites. FlexX incorporates protein-ligand interaction scores, ligand fragmentation along natural dividing points, ligand core placement in the active site, and reconstruction of the complete ligand from fragments. The FlexE module adds additional capabilities to the FlexX docking algorithm by performing docking calculations using an ensemble of protein structures. This ensemble can represent different ionization states as well as different conformations of the protein active site. In this manner, FlexE allows the user to consider protein flexibility while docking. FlexX-Pharm takes the FlexX-methodology one step further by incorporating pharmacophore-type restraints to guide ligand docking. [1]
![]() |
Figure 2: An essential green spatial constraint for a carbon atom, shown in the active site of carbonic anhydrase (1AZM), defined by selecting two atoms from the screen and interactively moving the sphere along a line connecting the pair. The tolerance of the constraint is adjusted via a slider located in the interface shown in the upper right corner of the image. |
Have it both ways
With the discovery bar raised, commercialization costs soaring, and development times creeping upward, companies prefer to identify new structures that operate through entirely novel mechanisms on novel targets – a “triple” that is becoming increasingly elusive. Drug discovery organizations currently are agonizing over the relative benefits of rational drug design on one hand, and the power of huge compound libraries on the other hand. Advocates of large libraries believe that chemical diversity is only through sheer numbers. This philosophy, which was fueled by the combinatorial chemistry and high-throughput screening revolutions of the 1990s, holds that large libraries open the door to serendipity and help to overcome “noisy” assays.
Reliance on sheer numbers is waning, as pharmaceutical firms are returning to their roots: smaller, intelligently crafted libraries. The return to rational design reflects growing concern for the cost associated with acquiring, managing, and testing huge numbers of compounds.
Because it relies on computing power rather than physical experiment, vHTS offers discovery companies the benefits of large-library serendipity as well as the opportunity for rationally-crafted molecule collections. Since vHTS employs digital representations, fewer “real” compounds need to be synthesized. In the example of the molecule with three variable substituents, FlexX software performs the virtual synthesis, up to the “hit” stage, within hours.
Computational platforms
![]() |
Figure 3: The results of a FlexX docking calculation performed using the four pharmacophore constraints, as defined in Figures 1 and 2, shown in the active site of carbonic anhydrase (1AZM). |
Although vHTS saves considerable time and money, its implementation is only as good as the models for both virtual compounds and virtual target, and the software’s ability to detect favorable interactions. Virtual screens, like physical tests, must account for all potentially critical interactions, including those arising from steric and/or conformational poses, for both putative drug and target. In short, vHTS is extremely computation-intensive and, until recently, was limited to small collections of compounds and targets. Luckily, the low cost of computing power overcomes the high computational demands of vHTS.
Distributed computing, in particular Linux clusters and grid computing, are relatively low-cost platforms for conducting large-scale, CPU-intensive molecular modeling experiments using components of the SYBYL product suite. For example, computing clusters based on the no-cost open-source Linux operating system offer dedicated number-crunching. According to Dr. Christian Lemmen, CEO of BioSolveIT, a 100-node Linux cluster offers almost 100 times the speed and throughput of a standalone PC, with an additional tenfold enhancement accessible through “algorithm improvements.” The resulting massive throughput and speed has transformed computational chemistry from a single-compound-at-a-time screened against a single substrate, to a highly parallel activity in which millions of putative compounds are represented and screened – often against multiple targets — in one virtual experiment.
![]() |
Figure 4: FlexX docks small ligands into binding sites using incremental construction to actually build the ligands in the binding site. |
The one-million compound library described earlier could not be practically screened on an ordinary workstation, says Lemmen, because each compound would take about half a minute to render and dock. On a Linux cluster, 30 to 40 such compounds are virtually screened each second, and the entire library in about one work shift.
Linux clusters enable parallelization of not just virtual compound collections, but targets as well. Testing for activity against, say, a family of related enzymes is critical to assure a drug’s specificity for the disease-mediating target. “You want to knock out the right kinase,” Lemmen states, “not every kinase in the body. A Linux cluster very easily cuts up and assigns this huge, multiplexed task among all the processors.”
Linux implementations are growing rapidly, but IT personnel are not as familiar with the operating system. For example, Linux installation and maintenance is more complex than that of an equivalent Microsoft Windows-based OS. Plus Linux machines will not generally run Windows applications.
Another distributed computing approach, grid computing, utilizes idle CPU time across networks to achieve the same goal as Linux clusters. On the plus side, grid computing requires no additional hardware or software investment and can operate under a Windows NT, Linux and other environments. Like Linux clusters, grid computing networks can be linearly scalable, so the computing power of even a mid-sized network can be quite impressive, especially after regular business hours. It is no exaggeration to say that the CPU power of a 10,000 CPU network at a large pharmaceutical company is nearly unfathomable.
According to Tim Williams, Sr. Product Manager at United Devices (Austin, TX), grid computing networks interoperate among operating systems, do not require that applications be written specifically for the grid environment, and may even include dedicated computing resources such as Linux clusters as part of the grid fabric. Today’s advanced grid computing software dynamically matches applications that require dedicated resources to a dedicated environment, such as a cluster, and also leverage the expansive power of a company’s network and PC architecture as additional resources for those applications capable of utilizing this power. Best of all to cost-conscious pharmaceutical companies, grids efficiently utilize the untapped CPU power of existing assets and operate under familiar OS platforms. The result is a massive increase in speed and scope while simultaneously lowering traditional IT costs. According to Lemmen, supercomputers, the alternative to distributed computing, are “overkill” for tasks like vHTS. Supercomputers are much more expensive than a 100-node cluster, use proprietary software, and are nowhere near as accessible to the average chemist. “Cheap hardware provides every molecular modeler access to this amazing computing power,” he explained.
Conclusion
In a relatively short time, Linux clusters and grid computing have transformed and multiplexed computational drug discovery. vHTS software such as the SYBYL from Tripos and the compatible Flex-software suite from BioSolveIT tap into this computing power by providing rapid, accurate representations of real-world pharmacology, in an in silico environment. A major benefit has been the empowerment of individual chemists to perform high-level virtual studies. Similarly, powerful computing at reasonable cost enables small discovery-based firms to compete with much larger competitors, at least in the realm of virtual synthesis and screening.
The commoditization of the CPU, coupled with science-enabling software, has been a boon to life science industries that rely on information processing, modeling of physical events, and good old-fashioned number-crunching. There is no reason to believe that cheaper, faster, more efficient computing hardware and more-powerful software will not continue to benefit drug developers for the foreseeable future.
End Note
1. FlexX, FlexX-Pharm and FlexE virtual screening tools were developed by BioSolveIT GmbH (St. Augustin, Germany). Tripos and BioSolveIT collaborate under a distribution partnership to market the Flex-software suite. The SYBYL platform is not required to run these stand-alone modules.
John H. Begemann, Ph.D., is Vice President of Discovery Software at Tripos, Inc. He may be contacted at [email protected].