Skip to content

Google's visual search function can now answer even more complex questions

    When Google Lens was introduced in 2017, the search feature accomplished a feat that would have seemed like science fiction not long ago: Point your phone's camera at an object and Google Lens can identify it, show you some context, and maybe even let you buy it. It. It was a new way of searching, without having to clumsily type descriptions of things in front of you.

    Lens also showed how Google planned to use its machine learning and AI tools to ensure its search engine appears on every possible surface. As Google increasingly uses its foundational generative AI models to generate summaries of information in response to text searches, Google Lens visual search has also evolved. And now the company says Lens, which powers about 20 billion searches per month, will support even more types of search, including video and multimodal searches.

    A new tweak to Lens means even more context for shopping will appear in the results. Shopping is, unsurprisingly, one of the top use cases for Lens; Amazon and Pinterest also have visual search tools designed to encourage more purchases. Search for your friend's sneakers in the old Google Lens and you may have been shown a carousel of similar items. In the updated version of Lens, Google says it will show more direct links for purchases, customer reviews, publisher reviews, and comparative shopping tools.

    Lens search is now multi-modal, a hot word in AI these days, meaning people can now search using a combination of video, images and voice input. Instead of pointing their smartphone's camera at an object, tapping the focus point on the screen and waiting for the Lens app to return results, users can point the lens and use voice commands at the same time, for example: 'What kind of clouds are they?” or “What brand of sneakers are those and where can I buy them?”

    Lens will also work with real-time video recording, taking the tool a step further than identifying objects in still images. If you have a broken record player at home or see a blinking light on a broken device, you can create a short video through Lens and see tips on how to fix the item through a generative AI overview.

    First announced at I/O, the feature is considered experimental and only available to people who have signed up for Google's search labs, says Rajan Patel, an 18-year-old Googler and co-founder of Lens. The other Google Lens features, voice mode and enhanced shopping, are being rolled out more widely.

    The “video understanding” feature, as Google calls it, is intriguing for a number of reasons. While it currently works with video captured in real time, if and when Google expands this to captured videos, entire repositories of videos (whether in someone's own camera roll or in a massive database like Google) could potentially be taggable and overwhelmingly shoppable become.

    The second consideration is that this Lens feature shares some features with Google's Project Astra, which is expected to be available later this year. Astra, like Lens, uses multimodal input to interpret the world around you through your phone. As part of an Astra demo this spring, the company showed off a prototype of smart glasses.

    Separately, Meta has just made a splash with its long-term vision for our augmented reality future, where mere mortals wear dorky glasses that can cleverly interpret the world around them and show them holographic interfaces. Of course, Google was already trying to realize this future with Google Glass (which uses fundamentally different technology than that of Meta's latest pitch). Are the new features of Lens, in combination with Astra, a natural transition to a new kind of smart glasses?