Hampton Creek, a San Francisco startup that makes eggless egg products in a lab, has garnered plenty of attention over the past year for its first product: Just Mayo, an egg-free mayonnaise that has transcended the Whole Foods yuppie boundary and is now sold in chains like Target, Wal-Mart, and Costco.
Behind the scenes, Hampton Creek is on a quest to build the world's largest plant database—a catalogue of plants and the properties that make them useful for different food applications. And one month ago, the company hired Dan Zigmond, the former lead data scientist for Google Maps, to lead its data efforts. The plant library quest just became serious.
Zigmond, who built the data team at YouTube prior to his stint at Google Maps, was attracted to Hampton Creek by its data challenges. "The mission is just so compelling. There’s human health, environmental sustainability, animal welfare. I think it’s hard to find anybody who doesn’t have a connection to one of those things, and I felt a connection to all of them," he says. "I wouldn’t have gone if I didn’t think there was a tremendously exciting technical challenge in organizing and understanding plant compounds."
When Zigmond arrived at Hampton Creek, the company had already gone through roughly 4,000 samples of different plant materials, and had just started to organize and understand them. It also had hired Lee Chae, a bioinformatics specialist, to test basic biochemical and food-related properties of the plants.
Hampton Creek is looking at everything from the structure of a plant's chemical bonds and its ability to form bonds with other compounds, to how it reacts with water. Some properties may make a plant more suitable for a mayonnaise application (as an emulsifying agent) or for cookies (as a gelling agent), and so on.
"One of the first things we’re doing is figuring out what data already exists out there. There are other sources of data on plants, and we’re also trying to understand within the data we have—what are the features of these molecules and plant compounds that are good predictors of their use in foods and other applications?" says Zigmond.
Zigmond and the team are now re-analyzing data from the initial 4,000 plants to predict how they'll behave in the lab, in addition to looking at new plants. In the future, predictions on plant properties will be key.
"On the one hand, we need to scale the lab itself, adding automation and doing assays in greater volumes," he says. "But no matter how high the volume, there's no way we could physically test every plant compound." There are about 400,000 known plant species—emphasis on "known"—and just a couple dozen fully sequenced plant genomes, according to Chae. That leaves Hampton Creek with a lot of work to do.
"It's a combination of predictive modeling and experimental assays," says Chae. Analyzing a single plant sample takes between an hour and a day, depending on the food application that Hampton Creek is looking at.
So far, the company hasn't run into any issues with plant availability, since the amount of plant material it needs for any given product is relatively small. "It may eventually be a factor. If we’re extremely successful or find promising materials that are quite rare, we need to factor that in," says Zigmond.
Hampton Creek isn't sharing any of its plant data yet, but there are obvious applications outside the company's mission—it could be useful to farmers, for example, in helping them decide which plants would make the best crops. "It's quite possible that licensing this data will be part of our long-term business model," says Zigmond.
[Illustrations: Courtesy of Flickr user Swallowtail Garden Seeds]