If you show PoseNet a photo, it will tell you exactly where it was taken. That sounds easy in a world where every photograph you snap is tagged with GPS coordinates, but PoseNet doesn't need GPS. Instead, it actually recognizes the scene in the image, and works out where you were standing based on that.
The system is accurate to six feet, and can even tell which way you were facing when you took the photo—to within three degrees.
PoseNet, from researchers at the University of Cambridge, uses something called deep convolutional neural networks to do its magic, which is based on the way the visual cortex of animals processes visual stimuli. These networks can be used for image recognition, including picking out faces from a crowd, even when partially hidden or upside down.
The technique has a few advantages over other kinds of image recognition. First, it’s fast. Show PoseNet a photo and it will tell you where it was taken within five milliseconds. Next, it’s lightweight. The PoseNet system relies on a database of less than 50 megabytes, whereas some rival systems need to store gigabytes of reference photographs, and then process them.
"I believe PoseNet has three main advantages over GPS and related technologies," PoseNet’s Alex Kendall tells Co.Exist. "Firstly, GPS requires infrastructure (e.g., the satellites). Secondly, GPS does not give you an estimate of orientation. Third, GPS is often inaccurate, and does not work in indoor environments."
PoseNet needs to be trained first, which involves showing it lots of photographs, which it studies and reduces to a small database. The images need to be labeled with 3-D camera location data, which tells the system which way the camera was facing, but that’s common in today’s cameras.
Currently PoseNet only works in one part of Cambridge, England—it’s a tech demo—but you can try it out yourself. The system was trained using a data set of 12,000 images, covering six scenes around Cambridge University. Because the system is so fast, and the data storage requirements so low, it could easily be scaled for worldwide use. Imagine if this tech was given access to Google’s Street View data: You’d be able to show it pretty much any photograph and know instantly where it was taken.
Speaking of Google—the search company has its own project that attempts to do the same thing: work out the location of a photo just by looking at it. But unlike PoseNet, Google’s PlaNet manages to place just 3.6% of images at street-level accuracy. The success rate rises to 10.1% at city level, but that’s hardly the "superhuman levels of accuracy" claimed by the PlaNet team.
"Our approach is most likely more accurate as it is trained on a smaller scale than PlaNet," says Kendall. And PoseNet can do something Google’s project cannot. "PoseNet is attractive because it is able to estimate the camera's location in metric coordinates. In contrast, PlaNet simply classifies an image into a discrete region."
There are, of course, privacy concerns with such technology. You can scrub the GPS coordinates from your photos easily, and devices like the iPhone remove it automatically when you share an image. But PoseNet needs no such GPS data—it just needs the picture and it knows where you are. There’s no hiding. Law enforcement is going to love it. The rest of us might think twice about posting all our photos online.