It’s Time For “The Visual Internet”

Raghava KK Contributor

Raghava KK is co-founder of Flipsicle, an incubator building out visual search and discovery solutions.

One of the amazing accomplishments of the smartphone is how it’s inspired us to take photos and share them — tons of them, with everyone. It’s estimated we have already taken more than a trillion photos in 2015. That would be more snapshots than have been taken in all of photographic history till now.

We should be proud. But it isn’t enough to take a photo. We need to find concrete ways to connect, discover and search these trillions of available pictures.

The Internet, at 25 years old, has more than a billion connected sites. However impressive, it pales in comparison to the 4.7 trillion photos (in the last few years) that have documented our lives, aspirations, feelings and creativity. I can’t help but think what a backup of this total human consciousness would look like. Imagine the superhuman web of insights that might emerge if you connected all photos taken!

It’s time for the emergence of a visual Internet. A visual Internet differs from a semantic one. Let’s look at the language of artists — visual thinkers who understand and manipulate visuals for a living. To an artist, a photo is multivalent, or different things to different people.

Artists see visuals as both windows and mirrors. A visual window allows us to look into a photo to see what’s inside: a dog, a tree, a beach, etc. Google’s search algorithms have been okay at telling us the objective content of a photo; the visual mirror, however, reflects the biases we bring to the visual.

A visual Internet will, however, build connections between photos, both as windows and mirrors. It will comprehend what’s in the photo and guess how a viewer may interpret it.

But how do we build these connections? So far, much of artificial intelligence (AI) in the visual space has focused on the window of a photo, analyzing and connecting based on objective content. But when you focus on the multivalence of a photo, you build richer intelligence and create more complex relationships between photos based on the mirror aspect — that is, subjective meanings.

The tricky part of subjective knowledge is that we can’t rely on the user to tell us how they really feel about a photo; nor can we get the true meaning of a photo. It is too often subconscious, and has to be decoded from the user’s behavior. Something we know for certain is that no computer can gauge your emotional response to something. Well, not yet.

So I suggest a new kind of user interface. Early in the Internet years, an interface was about beauty, whereas now it concerns relevance, behavior and choice. With good user interface, we can model simple human behavior to see how people subconsciously react to any photo. Tinder’s swipe is a great example.

The brain does the heavy lifting of interpreting a photo through a user’s actions — and it is studied through big data. As we tally data, we can cluster people solely based on their subconscious reaction to a photo. And that’s how we begin to understand the way people make meaning of photos.

I’ve been thinking: What if someone owned all the visual real estate on the Internet? They could collect invaluable insights about the way we think. Pinterest is on its way there. Aside from being a fine place to organize your ideas and thoughts, it has, under the hood, already indexed almost all the photos on the Internet (or the ones worth indexing). They should be thinking about new ways to discover this wealth of information.

A visual Internet will build connections between photos, both as windows and mirrors.

We have seen Instagram turn on the “Explore” feature, while other e-commerce players (Fancy, Wanelo and WeHeartIt) monetize user curation. MetaMind uses what it calls “vision algorithms” for deep learning. All these sites, however, still expect a user to explicitly tell how they organize and make sense of photos. I am waiting for new interfaces that collect information based solely on a user’s passive browsing.

Dr. Ramesh Jain, a computer science professor from UC Irvine, is attempting to solve this same problem through his startup Krumbs, which will let you connect photos via several criteria. Jain believes that just searching for a photo isn’t enough; we should be able to search from within a photo, too.

Here at Flipsicle we hope to launch the seeds for a visual Internet by using big data to connect visuals based on how a user subconsciously makes meaning from his photos — what his behavior tells us the photo makes him think or feel.

For me, art and tech are two eyes on the same body. Open one eye and view this beautiful world of ours. Open the other and you envision a richness you can’t articulate. Brought together — the objective and the subjective — they hand us immensely rich insights into who we are and how we think.

Now that we have all this information, what do we do with it? Understanding how a user thinks is as important as understanding how a user behaves. Great design, personal photos and subjective AI are about to become the enterprise game changers that will create fantastic businesses no one dreamed of. We will get closer to the user than ever before. And finally, computers will generate the much-needed empathy in this world.