
Figure 1. Through code optimization, Professor Warfield and his team were able to reduce DCI estimation runtimes by up to 75X on multicore Intel Xeon processors2. Running the same code on the many-core Intel Xeon Phi processor 7210 provided additional gains of up to 2.13X2.
Today’s most widely used medical imaging technologies have significant limitations when applied to the human brain. Traditional CT and MRI scans, for example, provide useful information about the bones and the blood vessels in the brain, but very little about neurons and other soft tissues. As a result, the most complex and important interactions within the brain have remained all but invisible.
Pulling Back the Curtains on Brain Function—and Dysfunction
Diffusion-Weighted Imaging (DWI) fills the diagnostic gap, providing a major step forward for research and medical teams working on brain-related issues. DWI works by using pulsed magnetic field gradients during an MRI scan, which allows the microscopic movements of water molecules (called Brownian motion) to be measured. The differential magnitude and direction of these movements provides the raw data needed to identify microstructures in the brain and to diagnose the integrity and health of neural tracts and other soft tissues.
Instead of diagnosing problems by assessing cognitive behavior, researchers and clinicians can now look directly at the brain. Are there lesions or other signs of damage? Are there disconnections, disruptions, or reorganization of neural pathways? By comparing the results with those of a normal brain at the same stage of development, researchers gain deeper insight into underlying causes and clinicians have better information for diagnosing problems, planning interventions, and measuring the impact of treatment.
DCI: Looking Even Deeper and More Accurately
Simon Warfield, Professor of Radiology at Harvard Medical School, has taken DWI a step further with his Diffusion Compartment Imaging (DCI) technique, which is able to extract additional information about soft tissues. For example, DCI can identify multiple neural tracts oriented in different directions, even when they reside within the same image voxel (the smallest volume used in a DCI analysis). Results are not only more detailed, but also more accurate.
Professor Warfield and his team are using DCI to investigate multiple medical issues.
- Concussions. DCI enables quick identification of brain damage that might have lasting impact. This is critical, since internal metabolic reactions within the first 48 hours following a concussion can lead to even more damage. Preventing ongoing injury is a topic of much research, and DCI is a valuable tool for such studies. DCI can also help doctors quickly and accurately identify patients that are at risk of further injury, and then monitor the progress of their treatment.
- Autism Spectrum Disorder. A national network, called the Tuberous Sclerosis Complex Autism Center of Excellence Research Network (TACERN), is longitudinally studying children who are genetically pre-disposed to autism spectrum disorder (ASD). Professor Warfield and his team have identified relevant neural circuits associated with risk factors for ASD, and can now identify infants that are most likely to be diagnosed with ASD with 86% sensitivity and 80% specificity. This information will help to accelerate research and provide better diagnostic tools. It may eventually be used to directly help at-risk children by enabling earlier intervention.
- Pediatric Onset Multiple Sclerosis (MS). A child’s brain appears to be more resilient to MS attacks than an adult brain, at least initially. However, as tissue damage accumulates over time, it seems that the brain may reach a point where that resilience is lost, resulting in sudden and rapid deterioration. With DCI, doctors will be able to monitor tissue damage and use more aggressive treatments as appropriate, while balancing the benefits of preventing new lesions against the harmful side effects of the drugs
Speed Matters
Professor Warfield and his team are committed to making DCI a clinically-useful tool that can be integrated efficiently into existing radiological workflows. During their early work on DCI, long data processing times were a roadblock to realizing this vision. A single DCI scan may take up to an hour and generate tens of gigabytes of water diffusion data. The calculations required to reconstruct the high-resolution images are complex and compute intensive, and initially took roughly 43 hours to complete. Such long processing times would interrupt normal diagnostic procedures in radiology departments. They would also make the images unusable in many emergency situations.
Image Processing in Just 16 Minutes
To address this issue, Professor Warfield and his team worked to optimize their code so it would perform and scale more effectively on today’s multi-core and many-core processors. The team is well suited to the task. In addition to his other roles, Professor Warfield is the Director of the Computational Radiology Laboratory at Boston Children’s Hospital, which has been an Intel Parallel Computing Center (IPCC) for several years. His team is experienced in code optimization and has access to Intel computing platforms, as well as Intel software tools and libraries for developing and optimizing applications for high performance.
The optimization efforts focused heavily on two key principles
- Vectorization enables the code to make better use of advanced single instruction multiple data (SIMD) execution resources in modern processors. SIMD capabilities continue to advance with each new processor generation, so that more calculations can be performed in each clock cycle for properly optimized code.
- Multi-threading enables the code to execute efficiently across large numbers of cores. An important part of this optimization effort involved using Intel® Threaded Building Blocks (Intel® TBB) for spawning and managing software threads. This runtime library provides a dynamic, low-overhead approach to segmenting and balancing workloads across large numbers of cores and threads. It can often provide significant improvements in parallel execution, while allowing developers to focus on high level tasks rather than the details of thread management.
The results of the optimization work were transformative, providing a 75X performance improvement when running on Intel Xeon processors and a 161X improvement running on Intel Xeon Phi processors
(Figure 1). A complete DCI study can now be processed in just over 16 minutes on a single workstation, so radiologists can begin interpreting results very quickly after a scan has been completed.
The optimized code is now available in the Insight Toolkit (ITK), a widely-used open source library for medical image processing. In addition to the software, the contribution to ITK includes information and examples that can help developers extend this optimization approach to other applications.

Figure 2. Improving DCI image resolution from 8 mm3 to 1 mm3 increased DCI estimation runtimes from 16 minutes to more than 2 hours. Running the optimized code on larger servers with more recent processors can help. Benchmarks showed that a four-socket server based on the Intel Xeon Platinum 8180 processor could reduce runtimes to just 39 minutes—a 3.2X improvement versus a previous-generation, two-socket server3. (The test workload was based on data from the Human Connectome Project diffusion MRI acquisition).
Increasing Image Resolution by 10X
Data processing requirements are invariably a moving target. Professor Warfield’s team has already started working with higher density DCI techniques that use smaller voxels (1 mm3 versus 8 mm3) to improve image resolution. This higher resolution data adds to the processing load, which has increased processing times to more than two hours.
Fortunately, when code optimization is done properly, an application will tend to scale more efficiently as processors evolve to provide more cores and threads. To verify these benefits, the team recently benchmarked the DCI data processing software on a selection of newer processors. Using diffusion MRI data from the Human Connectome Project, they measured the time it took to compute the DCI model on two-socket servers based on the Intel Xeon processor E5-2697 v4 and the Intel Xeon Gold 6148 processor. They also ran the workload on both two-socket and four-socket servers based on the Intel Xeon Platinum 8180 processor.
Performance scaled well on the new processors (Figure 2). The higher core counts and frequencies of the Intel Xeon Platinum 8180 processor offered especially significant performance advantages.
Performance was highest on the four-socket server configuration, which reduced the total runtime to just under 40 minutes, a 3.2X improvement versus the previous-generation two-socket server.
Sorting Through Hundreds of Images Quickly
Data processing isn’t the only challenge with using DCI in clinical settings. A clinical scan typically generates hundreds to thousands of individual images. This is part of a larger trend in medical imaging. The number of scanners in a typical hospital has increased dramatically in the last decade, as has the average number of images per study. With more patients being scanned and more images per scan, many radiologists are finding it difficult to keep up with the increasing volume of work.
Artificial intelligence (AI) offers a potential solution to this challenge. Professor Warfield is working with Intel tools and libraries to develop AI applications that automatically—and quickly—sort through hundreds of images to pinpoint those that differ from comparable images of a healthy brain. The goal is to provide the radiologist with the essential highlights of a complex study, identifying not only the most relevant images, but also pinpointing the critical areas on those images.
Mapping the Human Brain
DCI is beginning to be translated into clinical settings, as are the biomarkers determined by Professor Warfield and his team. Diffusion MRI is also seeing broader use in the research community. One example is the Human Connectome Project (HCP), a collaborative project that seeks to map the structural and functional connections of the human brain. Like the Human Genome Project before it, it is providing a rich set of baseline information that will help fuel faster and deeper research. With these and many other developments, our ability to understand and heal the human brain is ramping up fast.
Learn more at: http://www.crl.med.harvard.edu/
This article was produced as part of Intel’s HPC editorial program, with the goal of highlighting cutting-edge science, research and innovation driven by the HPC community through advanced technology. The publisher of the content has final editing rights and determines what articles are published.