Eye-tracking has long been one of the those computing holy grails. Whether it’s accessibility concerns, opening up new form factors or just trying to put a novel spin on the ways we interact with our devices, decades of consumer electronics are littered with attempts to use the gaze to unlock a new form of input.
The thinking behind Carnegie Melon University’s punily named EyeMU (I suppose “ICU” is a bit intense) is simple. Phones are big these days. If you’ve ever attempted to use a modern flagship with a single hand, you know the inherent pain points. If you’re lucky, maybe you can tap an icon with a thumb while drinking a coffee with your other hand.
While most previous attempts have fallen short, current phones sport a variety of different technologies that could help unlock this functionality in a natural way. I recall trying some TV sets with eye tracking years ago and feeling the same sort of frustration one encounters when attempting to view the 3D image on one of those Magic Eye posters. A combination of good front-facing camera hardware, Google’s Face Mesh and the proper algorithms, however, could go a ways toward both responding to — and predicting — user intention.
“The big tech companies like Google and Apple have gotten pretty close with gaze prediction, but just staring at something alone doesn’t get you there,” Associate Professor Chris Harrison says in a release. “The real innovation in this project is the addition of a second modality, such as flicking the phone left or right, combined with gaze prediction. That’s what makes it powerful. It seems so obvious in retrospect, but it’s a clever idea that makes EyeMU much more intuitive.”
The other key here, I think, is not attempting to do everything with just the eyes. More traditional input is required, but the video shows how researchers can get a good deal done relying on a combination of gaze and gestures. Take a photo app. Looking at an image selects it. Pulling it closer to the face zooms in and jerking the phone left or right applies filters. What’s most exciting here is how much of this can be done with existing hardware as a way to supplement touch and voice input.