Innovating New Ways to Share and Preserve Scientific Data on Sustainability
Rensselaer Polytechnic Institute is a key partner in a new project to create better technologies for scientists and engineers to store, share, and preserve important scientific data related to sustainability research.
Funded by an initial two-year, $2 million award from the National Science Foundation (NSF), the multi-university Sustainable Environment-Actionable Data (SEAD) effort is expected to receive a total of $8 million over five years. By pairing social networking technologies similar to Facebook, YouTube and Flickr with leading-edge web science and network science, the project aims to hasten scientific discovery and innovation. It will enable researchers who study sustainability to share their data in a way that is much easier and faster than current methods.
Underneath these user-friendly services will be a long-lived, robust digital infrastructure and organization to ensure that environmental and social data remains accessible and secure for decades into the future. The project is expected to lower the cost of administrating these data sets, while increasing their usability and impact.
During the project, the research team will work closely with scientists in sustainable land use, water quality, urban planning, and agriculture, with an initial focus in the Upper Great Lakes and Mississippi River Basin. Leading the effort for Rensselaer is James Myers, director of the Institute’s supercomputing center, the Computational Center for Nanotechnology Innovations (CCNI).
“In this new project, we’ll be developing and delivering infrastructure that links active research and long-term preservation of important reference data to a degree that hasn’t been done before,” Myers said. “I believe this coupling will prove to be tremendously powerful in sustainability research and beyond and will ultimately have a dramatic impact on the pace of academic and industrial research, as well as the scope and scale of research projects that can be tackled.”
Leading the effort is principal investigator Margaret Hedstrom of the University of Michigan, along with co-principal investigators Myers from Rensselaer; Praveen Kumar of the University of Illinois; Beth Plale of Indiana University; and Ann Zimmerman of the University of Michigan. SEAD is funded as part of NSF’s Sustainable Digital Data Preservation and Access Network Partners (DataNet).
Researchers in the natural and social sciences collect and describe their data in different ways, and then store it in different formats and in different places. Making it easy for researchers to find and integrate that data opens the door for innovative research on the connections between the environment and human activities, and ultimately will improve our ability to understand and manage challenges such as those related to climate change, increasing demands for food and fuel, and societal changes, Myers said.
In addition to building a practical infrastructure to support sustainability researchers’ need to access and integrate a wide range of environmental and social data, SEAD is designed to explore two more interrelated challenges. First is the technical challenge of managing large amounts of diverse data over the long term. Second is the business challenge of structuring a data service organization that provides sufficient capabilities at low-enough cost to survive for the decades and centuries through which the data will have value to society.
Many Rensselaer faculty and student researchers will benefit from SEAD, and contribute to it directly and indirectly, Myers said. CCNI plans to offer SEAD technologies to support users of its anticipated new ”balanced” supercomputer to be installed in the next few years, and will explore wider use in research projects across the Rensselaer campus.
Myers said he expects SEAD will be able to leverage the broad range of innovative work in data science occurring in many places at Rensselaer, including CCNI, the Data Science Research Center, Tetherless World Constellation, and Network Science and Technology Center.
“SEAD’s practical focus on providing services for sustainability research is motivated by and will further motivate the types of data science research being conducted at Rensselaer,” Myers said. “How do we integrate data and automatically discover features that span the combined data sets? What provenance do we need to record to allow us to trust data and validate scientific results? How can we identify truly valuable data from the network of discussions, publications, and decisions in which it is used? How do we do all of this with petabytes of data? These are all questions that data and network science researchers are exploring, and I hope my colleagues here at Rensselaer and others will see SEAD as a vehicle to bring their research findings to bear on critically import challenges facing our society.”
The National Science Foundation’s Sustainable Digital Data Preservation and Access Network Partners (DataNet) has funded the following projects, in addition to SEAD: The Data Conservancy: A Digital Research and Curation Virtual Organization, based at Johns Hopkins University; DataONE: Observation Network, based at the University of New Mexico; the DataNet Federation Network based at the University of North Carolina; and Terra Populus: A Global Population/Environment Data Network (TerraPop) based at the University of Minnesota
Since opening in 2007 as the world’s seventh largest computer, CCNI has helped researchers at Rensselaer and around the country tackle scientific and engineering problems ranging from the modeling of materials, flows, and microbiological systems, to the development of entirely new simulation technologies. More than 700 researchers, faculty, and students from 50 universities, government laboratories, and companies have run high-performance science and engineering applications at CCNI.