Some mysteries of science can only be explained on a nanometer scale — even smaller than a single strand of human DNA, which is about 2.5 nanometers wide. At this scale, scientists can investigate the structure and behavior of proteins that help our bodies fight infectious microbes, and even catch chemical reactions in action. To resolve these very fine details, they rely on synchrotron light sources like the Department of Energy’s Advanced Light Source (ALS) at the Lawrence Berkeley National Laboratory (Berkeley Lab).
For decades, synchrotron light sources have been operating on a manual grab-and-go data management model — users travel thousands of miles to run experiments at the football-field-size facilities, download raw data to an external hard drive, then process and analyze the data on their personal computers, often days later. But, a recent deluge of data — brought on by faster detectors and brighter light sources — is quickly making this practice implausible.
Fortunately, ALS X-ray scientists, facility users, computer and computational scientists from Berkeley Lab’s Computational Research Division (CRD) and the National Energy Research Scientific Computing Center (NERSC) recognized this developing situation years ago and teamed up to create new tools for reducing, managing, analyzing and visualizing beamline data. The result of this collaboration is SPOT Suite, and it is already transforming the way scientists run their experiments at the ALS.
Science Superhighway: Setting the Stage for SPOT
At the ALS and other light sources, bunches of electrons shoot down a linear accelerator — from zero to almost light speed — faster than the blink of an eye. The electrons are then transferred to a booster, and then a storage ring where specialized magnets accelerate and focus them further, generating a cascade of X-ray photons along the way. This light is then tuned and focused to feed 40 simultaneously operating beamlines, each with wavelengths ideal for resolving the molecular, atomic, electronic, and magnetic properties of everything from proteins to computer chips. Users can extrapolate the very fine details of a sample’s structure and behavior, by using computers to process and analyze the data.
“When a scientist — be it biologist, chemist or geologist — comes to the ALS with their samples, they spend one or two days, maybe even a week, collecting a whole bunch of data. Then they’ll download it, take it home, and most of that data will never get analyzed,” says Craig Tull, who leads Berkeley Lab’s Science Software Systems Group and the SPOT Suite Laboratory Directed Research and Development (LDRD) project. “SPOT Suite is designed to streamline and automate this process.”
One of the first things the SPOT Suite collaboration did was set up data transfer nodes between NERSC and three ALS beamlines: 7.3.3 SAXS/WAXS/GISAXS/GIWAXS (Grazing incidence small-angle X-ray scattering/ Grazing incidence wide-angle X-ray scattering), 8.3.2 hard X-ray Microtomography and 12.3.2 Microdiffraction.
During an experiment, the data transfer nodes allow beamline data to travel across DOE’s Energy Sciences Network (ESnet) at speeds of gigabits (one billion bits) per second. Once the data arrives at NERSC, it is archived, processed on supercomputers and then served up to users in real time via a web portal or “science gateway.” As a result, researchers are interpreting and sharing data and adjusting their experiments in minutes or hours instead of weeks or months.
“Our goal with SPOT Suite is to make advanced algorithms, software and high performance computing available to the masses,” says Tull. “We want beamline scientists and users to be able to access these resources without having to become super experts in computer or computational science.”
Automating Beamline Experiments
Earlier this year, the ALS became the first and only facility worldwide to fully automate GISAXS/GIWAXS measurements. This tool is primarily used to characterize the assembly and shape of nanoscopic objects at surfaces or buried interfaces in thin films — including materials like organic photovoltaics, fuel cell membranes or batteries.
Combine this capability with SPOT Suite, and researchers can run experiments at this beamline from anywhere in the world, provided they have Internet access. Users can mount their samples onto barcoded sample holders at home and ship them to the ALS. At the facility, a robot arm transfers each new sample to the measurement stage, where it is automatically aligned into grazing incidence using the X-ray beam. A barcode reader informs the computer system which sample is mounted and how to run the sample. Data acquisition software then moves the sample to all angles pre-specified by the researcher and chooses the appropriate exposure time automatically for each image. As images are collected, SPOT Suite sends the data to NERSC via ESnet for scientists to access and view any time.
“This automated system represents a significant leap forward in terms of labor saving, ease of use and throughput,” says Alexander Hexemer, who manages the GISAXS/GIWAXS beamline at the ALS.
In fact, when the facility conducted its first high-volume test run, the automated-robotic system collected over 1,300 exposures. The researcher — Alessandro Sepe, leader of the X-ray scattering group and responsible for the synchrotron studies at the Cavendish Laboratory, University of Cambridge — viewed all of the incoming data from his mobile device via SPOT Suite. He was able to inspect the data for quality and exposure issues and provide feedback on the experimental setup — all while traveling by train between the United Kingdom and Switzerland.
“This automated system allows us to run our experiments much faster [and] dramatically reduces our measuring time,” says Sepe.
Before the GISAXS/GIWAXS automation, Sepe would have to travel to Northern California with at least four other colleagues to run experiments at the ALS. “To make the most of my allocation, I was running experiments at the beamline 24 hours a day. This means that I needed at least two people working 12-hour shifts at the beamline,” says Sepe. “It was quite expensive and time-consuming. But now I can see what the detector is seeing without delay from thousands of miles away.”
According to Hexemer, the GISAXS/GIWAXS beamline at the ALS is currently oversubscribed by a factor of two. “There are only a handful of places in the world where you can do this kind of science, so beamtime is very precious,” says Hexemer. “If you are very lucky, you might get two days of beamtime every six months.”
Time-consuming data analysis is currently the main bottleneck to scientific productivity on his beamline, he adds. “Right now data analysis takes so long that by the time users come back for their follow-up runs six months later, many will not have analyzed all of the data from their previous visit,” says Hexemer.
This is why he believes SPOT Suite’s framework to deploy analysis code is so transformative. “[With SPOT Suite] you can look at your data in real time and make adjustments to your next experiment based on that result,” says Hexemer. “It allows you to really take advantage of your beamtime.”
Michael Manga, UC Berkeley Professor of Earth and Planetary Sciences whose group runs experiments on the ALS 8.3.2 microtomography beamline, is already relying on this new software tool to do the bulk of his data processing. Before SPOT Suite, every time Manga imaged a rock sample, the raw data would be saved to a server at the ALS. He then had to transfer that data to a local drive, import it to his computer and manually manipulate and reconstruct it. Because each image file was about 20 gigabytes, this process was very time-intensive. And while he was handling these files, he would concurrently be making adjustments to his beamline experiments and imaging other samples.
Now that SPOT Suite immediately transfers and archives microtomography beamline data at NERSC and has mostly automated the data processing, Manga can just open a web browser and watch his reconstructed data come in. In the past year his team collected 900 sets of 2000 images, some from experiments that required continuous monitoring and intervention.
“The ability to have data automatically transferred and reconstituted saved us weeks of work and freed up beamtime to focus on doing experiments,” says Manga.
In addition, with SPOT Suite collaborators sitting anywhere in the world can watch ALS experiments in real time over the Internet. “The last time I was at the ALS I had collaborators in Switzerland and Atlanta, both logged into SPOT Suite and watched our data come in and be processed in real time,” he says.
Manga predicts that one of the biggest impacts SPOT Suite will have on beamline science is the ability to identify real-time science opportunities. “If I am doing something and realize that so-and-so at the University of Bristol will be interested in this topic and have something insightful to say, I can now just send them an email with a link to my data,” he says. “It’s the ability to collaborate in real time that will have the biggest affect on our science.”
Even though SPOT Suite has been in the works for about two years, Tull emphasizes that it is still in an early phase. As the team better understands how computing, data and metadata will best be used to advance research at the beamlines, improvements will be made to the tool.
For instance, Hexemer is currently working with researchers in CRD’s Center for Applied Mathematics for Energy Research Applications (CAMERA) to develop a tool that would allow users to perform real-time simulation of scattering patterns while they are collecting data at the GISAXS beamline. Comparing the computed and observed patterns allows users to see how they may be conducting better experiments.
They are also working on a tool to extract key structural and morphological information from observed data with high confidence, which will allow researchers to make the most of their GISAXS allocation. Once these features are more fully developed, Hexemer hopes to incorporate them into SPOT Suite.
“At this point, light sources have entered into a brand new era, where they have more data than their previous approaches and hardware can handle,” says Tull. “Because of the experience that we in CRD, NERSC and ESnet have in supporting high-energy physics and other science domains with big data challenges, I believe our work will have a transformative affect on their science.”
More information: Big Data Hits the Beamline
About Berkeley Lab Computing Sciences
The Lawrence Berkeley National Laboratory (Berkeley Lab) Computing Sciences organization provides the computing and networking resources and expertise critical to advancing the Department of Energy’s research missions: developing new energy sources, improving energy efficiency, developing new materials and increasing our understanding of ourselves, our world and our universe. ESnet, the Energy Sciences Network, provides the high-bandwidth, reliable connections that link scientists at 40 DOE research sites to each other and to experimental facilities and supercomputing centers around the country. The National Energy Research Scientific Computing Center (NERSC) powers the discoveries of 5,500 scientists at national laboratories and universities, including those at Berkeley Lab’s Computational Research Division (CRD). CRD conducts research and development in mathematical modeling and simulation, algorithm design, data storage, management and analysis, computer system architecture and high-performance software implementation.