Lip Reading, 3D Desktops, And NUI: Microsoft Plans To Reinvent User Interaction

Deep in the skunk works of its Research and Labs divisions, secreted around the Seattle area, Microsoft is working on totally reinventing the way people interact with their computers. Very little is out in the open or in more than a prototype form, but the work is unquestionably being done.

Last week it transpired that Microsoft is working on building Kinect into the bezels of laptops, and after that, presumably, tablets and eventually mobile phones. But it’s not just about building out the install base for Dance Central 3. It’s enabling the next generation of awareness in our electronics. The iPhone ushered in an era where our devices know when we touch them. Microsoft is working on the next one, in which our devices will simply know us.

How do you, as a person, experience the world around you? You mostly see and hear, and to a lesser extent you touch, taste, smell. Our devices, however, are largely restricted to an extremely limited sense of touch. Why shouldn’t they be more like us?

There’s a good reason, actually: computers don’t need to be like people because computers aren’t people. For years this has held true: the computer’s primary purpose for decades was to sit still and perform calculations humans couldn’t do. Interaction with a computer was strictly input, output. You didn’t interact so much as instruct, and wait for the result.

But mobile phones and touchscreens and laptops began changing the idea of a computer into something more personal, more interactive, more two-way. And technology exists to let our devices become more human. Why not let them?

Microsoft wants to. Despite their reputation among tech enthusiasts as a sort of stodgy blue-chip still coasting on the PC explosion of the late 90s and early 2000s, their R&D sections are world-class and put out actually innovative ideas and devices all the time. The trouble, briefly stated, is that implementing these ideas as products that fit into the Microsoft ecosystem isn’t easy, and even if it were, Microsoft has no talent for it.

But this work on “Natural User Interaction,” or NUI, is more promising. People have embraced the idea in gaming: the Wii led the way and the Kinect brought the future into your living room, though the future is a little laggy and the voice controls spotty. People are simply interested in new ways of interacting with their content and devices. For years the promise of a different kind of interaction has been dangling, in the form of sci-fi shows and movies usually, and people have always been intrigued by it.

So people want it — and Microsoft wants to make it — and they have the technology. Purchasing the IP behind the Kinect was an extremely smart move, maybe smarter than they know. What started out as a way to cash in on the market the Wii had created has snowballed into an entirely new form of interacting with computers, and a way for Microsoft to differentiate itself meaningfully for years to come.

It was reported to me that one of the things the new Kinect/depth/IR sensors will do is read lips. At first it sounds silly. Why? Maybe so it can better interpret your words from across the room, or in a loud environment. You won’t have to turn the music down to search and navigate the web on your TV or tablet.

And then it becomes clear that it’s just part of a larger suite of “senses” the device would have. The new devices are to have face recognition and voice recognition, so your password will be you saying your password in your own voice, not someone else, and not a print-out of you. They’ll be able to pick you out of a crowd, say a small party, and will be able to tell when you’re giving it a command — because you make eye contact and move your lips. Again, it sounds perfectly ridiculous until it starts sounding perfectly natural.

Another feature described was a sort of 3D desktop on which you could actually grab files and place them here and there. This has been tried before, of course, and Windows 8 is looking decided two-dimensional, so it’s probably more of a research project than anything. But it’s still interesting. Think of the basic gestures you might be able to make. One was described as pulling out a drawer. In the surprisingly resilient desktop metaphor of files and folders, what could be more natural? Or perhaps raising your hand palm up to show the task bar or dock? Trace your finger in a counter-clockwise circle to undo, clockwise to redo?

User experience reflects both the needs of the user and the capabilities of the device. For a few years now we’ve been satisfied with running our fingers along a slab of glass, producing an electrical signal interpreted as a point or blob — mainly because capacitive screens got good and cheap, and nobody wants to plug a mouse into their phone. But there are many other ways of interacting with our new mobile objects and information. Soon the glass touchscreen will seem as quaint as the command-line interface.

And yet, some are no doubt thinking, we still have some command-line interfaces in use. Sure. And mice and keyboards are still better for productivity, and a pen and paper is better for sketching out ideas, and headphones are better for listening to music in public. There are countless use cases and potential applications of technology, but it’s good to recognize when one should give way or simply isn’t applicable.

Microsoft is working hard at this, and you’d better believe that Apple is too, though they aren’t nearly as open about their research. And for once, they seem to actually be missing a piece of the technology pie: Microsoft has a head start on them in the world of NUI, having purchased and developed depth and personal sensors for at least two years now. Apple can always throw money at the problem, but it’s pretty clear that Microsoft has perceived this rare advantage and will be using it as a wedge wherever possible.

This shouldn’t be taken as an indication that Windows 8 is going to be anything other than advertised, but I think it will be a test bed for some major changes coming down the line. Microsoft wants to change the way people interact with computers because it sees, hopefully not too late, that the old way, the PC way, treating a computer like a box that computes things, is on its way out in a hurry. So if computers are going to be a part of the real world, they need to be able to live in that world. Eyes, ears, and who knows what else. It’s only creepy until you can’t live without it.

[images: Matthew Fisher/Stanford, Wolfgang Herfuntner]