You do two things when you type a query in a search engine: you provide the search engine with object identification and its context. This manual input precedes a valid query result.
Computer vision has the potential to change the entire nature of search, because it eliminates the need for the user to input a search query, and instead utilizes sensor information (like noise and images), which provides the engine with context for the search.
It makes sense. Humans create meaning and relevance of objects in the real world in a very similar way as computer vision. Say there was an object in front of your eyes. It’s obvious that your brain must first use its eyes to see that object and its context before it can create meaning about it.
What object are you looking at? Oh, it’s a shark. So what situation are you in? Are you behind glass, at an aquarium, or is it next to you in the ocean? Only after the object has been recognized and placed in context can your brain create the right query to give that object the appropriate meaning: Oh, cool shark … or, Uh-Oh (and total panic).
Eyes and computer vision are game changers, but to establish perfect context, our devices must also be able to process natural language, too. In augmented reality, computer vision and speech recognition together will transform search. They will replace the traditional search engine as the origin point of most queries.
Technologies like Amazon’s Flow look more like search in the future. Flow uses both bar code and image recognition to continuously recognize tens of millions of products in a live camera view. Users point to an item and Flow overlays pricing, availability, reviews, media content and other information directly over the item in view.
Some day your devices will already know what you are trying to find because their sensors will have enabled them to contextually anticipate your query, like Flow in real time — and a lot more powerful.
Computer vision also will change where the query originates. The query will no longer originate when a user goes to a search engine site and manually enters the object and context they seek, but will now originate from the computers eyes and ears.
Humans create meaning and relevance of objects in the real world in a very similar way as computer vision.
Therefore, the eyes and ears disrupt the search engine by circumventing it to the top of the search pyramid, and will have the leverage to get the majority of the finder’s fee, while search engines will become a back-end commodity that only get queries at the discretion of those who control your smart devices’ eyes and ears. This has to be why Amazon created Amazon Flow.
Further, we will eventually get to a point where humans will often not need to initiate the query; our devices, through their sensors, will be able to anticipate the query before it is asked, hence post-search anticipatory computing, which is why we will start to see sponsored recognitions replace sponsored searches.
And whoever controls the brand scanner can monetize without having to pay a finder’s fee. The visual search engine replaces Google’s search page’s prime Madison Avenue real estate, making it more like a back-end commodity to those that control our devices’ ears and eyes.
It sounds a little Orwellian, but to get to post-search, our devices are going to have to be constantly listening and watching what we are doing. This is all inevitable — and good. It will lead us into a new age of true anticipatory contextual and augmented computing, where our devices provide the information and intelligence at the exact moment we want it.
However, while AR HMD will enable us to summon on demand anything we can imagine, this technology will need to move even further — to the Post-Search Era — to be fully impactful. Whether AR HMDs can do this is the billion-dollar question.