Today, petabytes of digital information are generated daily by such sources as social media, Internet activity, surveillance sensors and advanced research instruments. The results are often referred to as “big data”—accumulations so huge that highly sophisticated computer techniques are required to identify useful information hidden within.
Graph analysis is a prime tool for finding the needle in the data haystack. This potent technology—not to be confused with simple illustrations like bar graphs and pie charts—utilizes mathematical techniques that represent relationships in the data more efficiently than traditional statistical analyses.
Researchers at the Georgia Tech Research Institute (GTRI) are bringing graph analytics to bear on a range of data-related challenges. They’re developing advanced technology that can help investigate social networks, surveillance intelligence, computer-network functionality, industrial control systems and more.
“Our first task is to look at the interesting properties of a graph—to find the important questions we can ask of that graph,” said Dan Campbell, a GTRI principal research engineer who heads the High Performance Computing Branch. “The second task is to find the answers as quickly as possible, and then put them to practical use.”
A graph is a type of data structure comprised of entities—meaning anything that can be represented digitally—and their relationships. In graph terminology, an entity is a vertex or a node; the connections between it and other vertices are edges or arcs. Graphs are constructed using software algorithms that represent both the data points and the relationships between them, and also enable computers to manipulate and analyze that information.
GTRI researchers make extensive use of a graph-analysis framework called STINGER, built specifically to tackle dynamic, ever-changing applications such as social networks and Internet traffic. STINGER was created by a team led by David A. Bader, a professor in the School of Computational Science and Engineering; key members of that team included David Ediger and Robert McColl, who are now part of Campbell’s GTRI group. STINGER, which is open-source software (STINGERgraph.com), continues to be developed at Georgia Tech and in the broader graph analytics community.
“We’ve done a great deal of work on analyzing openly available social media in real time,” said Ediger. “Social media analysis clearly has an important role to play in emergency response to both natural disasters like Hurricane Sandy and to potential terrorist attacks, and we’re actively researching applications in those areas, among others.”
STINGER helps support GTRI’s focus on streaming or dynamic-graph technology, which can store very large databases and then update them in real time as new data come in. This novel approach allows users to monitor social media on a massive scale, and can also be utilized to simulate very large networks.
“Unlike traditional graph databases, STINGER’s streaming-graph technology lets us store very big graphs and analyze them at high speed using fairly modest computing capability,” said Jason Poovey, a GTRI research scientist in Campbell’s group. “In half a terabyte of main memory—a pretty reasonable size today—we can handle billions of nodes and edges. Our benchmark tests show we can represent, update and analyze a graph in real time that’s essentially the size of all the data in Twitter.”
Source: Georgia Institute of Technology