Virtual Reality: Who says Rome wasn’t built in a day?
With the rise in popularity of Internet photo-sharing sites like Flickr and Google, community photo collections have emerged as a powerful new type of image dataset. This kind of data presents an opportunity to reconstruct the world’s geometry using the largest known, most diverse and largely untapped, multi-view stereo dataset ever assembled. What makes the dataset unusual is not only its size, but the fact that it has been captured “in the wild,” not in the laboratory, leading to a set of fundamental new challenges in multi-view stereo research. This rendering of a 3-D model of the Duomo in Pisa, was reconstructed from 56 photographs downloaded from Flickr. Images found on these public photo-sharing sites can now be used to help build accurate 3-D models of the real world. Courtesy of Michael Goesele, TU Darmstadt and collaborators at the University of Washington |
With the muscle of about 500 computers and 150,000 still images, Steve Seitz and his colleagues have reconstructed many of Rome’s famous landmarks in just 21 hours. With support from the National Science Foundation (NSF), they’re rebuilding Rome pixel by pixel rather than brick by brick.
“The idea behind “Rome in a Day”‘ is that we wanted to see how big of a city or model we could build from photos on the Internet,” says Seitz, a professor in the Department of Computer Science and Engineering at the University of Washington’s Seattle campus.
Calculations that once took months now take hours. “This is the largest 3-D reconstruction that anyone has ever tried,” explains Seitz. “It’s completely organic; it works just from any image set.”
The project starts with a trip to the photo-sharing site Flickr to search for images of the real thing. Once pictures are identified, the computer starts the process of making 3-D objects from 2-D stills. Sameer Agarwal, a former postdoctoral scholar at the university is mostly responsible for creating the algorithm that makes 3-D objects in virtual space from thousands of 2-D images.
“If I am a sculpture, and there were three photographs of me, we would try to find three points in each photograph that point to my nose. From that, we know that there are three points in these images that correspond to a single point in the 3-D world,” explains Agarwal. “We would be able to say where in a particular image corresponding to that camera, the image of my nose should show up. This statement can be written as an equation involving the position and orientation of the camera, the position of my nose and where in the image my nose shows up. And you can connect all of these equations together and solve them to, in one shot, obtain both the positions of the cameras, as well as the position of my nose in the 3-D world relative to those cameras.”
Computers map huge clusters of points in 3-D space creating ghost-like images called “Point Clouds.” Seitz says the imaging is very accurate.
“For the buildings, I think we can get accuracy to within a few centimeters. We’ve measured this. For individual objects that are photographed closer, we can potentially do a lot better, like millimeter accuracy.”
Finally, color and texture are added. What Seitz and his colleagues have gotten are virtual 3-D tours of cities like Dubrovnik, Croatia, or Venice, Italy.
“What excites me is the ability to capture the real world; to be able to reconstruct the experience of being somewhere without actually being there,” says Seitz.
In the future this next-generation technology may show up in places online like mapping sites, video games or real estate sites — it’s a virtual guarantee.