Big Data On The Keyboard

Editor’s note: Maoz Shacht is the CEO of Ginger Software, a mobile keyboard developer.

The Information Age has the potential to overwhelm. When that happens on a technical front – when the volume of data is so large that traditional databases cannot handle it, or not handle it well – the industry refers to the phenomenon as “big data.” The term has even come to refer to the technological processing that takes place with the vast amounts of data.

Thus, any time a category of data contains billions (or even trillions) of records from all over the web and other sources, we are talking about big data. Often we don’t even notice the “big data” aspect of our daily encounters with technology, such as when it comes to the autocorrect feature on mobile devices, word-processing programs, email clients and more.

Autocorrect and Word Suggest

Despite the often comical renditions provided by the autocorrect feature – to the extent that there are any number of websites devoted to showcasing the humorous (and often racy) errors – the capacity of the device to correct your typing and even predict your next word is unusually helpful, as it saves you from the embarrassment of the typos your fingers typed.

It is also daunting, if you think about it. Nearly any combination of letter that you type in nearly any sequence will yield (mostly) reasonable suggestions by your smartphone. When you factor in the programmable capacity for foreign languages, as well, and the “swipe” option of many smartphone keyboards, the near-infinite number of combinations is a matter of big data, indeed.

Word-suggest and autocorrect work based on an algorithm that essentially checks the combination of letters that you type against the dictionary that is loaded into your smartphone – and there is more than one dictionary available. For example, every time I type in a foreign alphabet, my phone offers me a dictionary in that language.

When the letters you’ve typed match findings in the dictionary, the smartphone offers those matches as possibilities for the word you are typing. If correct, accepting the suggested work abbreviates your typing time and makes your smartphone communication more efficient. If no match is found, then the phone is programmed to offer alternatives, some of which are correct, some of which make sense, even if they weren’t what you’d had in mind, and some of which provide the fodder for the comical online autocorrect compilations.

Finding the Right Words

Programmers face some challenges in determining which key strokes yield which suggested words, including:

  1. Creating a comprehensive dictionary – one that isn’t watered down, but is still manageable and modern – including the modern slang that is likely to show up in text messages, for example.
  1. Determining a language model that has no significant deficits – one that examines the words you are typing in their context and offers an educated suggestion, as it were, for the correct spelling.

That is, if you type “taxos,” did you mean “taxis” or “tacos”? Your keyboard offers both options. If, however, you had intended to type “taxes,” you will need the contextual value of “there’s nothing sure but death and….” for your keyboard to suggest “taxes.” If you mistakenly type “taxos,” only the most sophisticated autocorrects will get it right; otherwise, you are still contending with the choice of taxis or tacos (or taxos). Anyone who has used autocorrect knows to be impressed by the frequency with which it chooses the correct term to suggest.

How Does the Keyboard Know?

The spell-checker of Google’s search engine learns your preferences and corrects accordingly. Most phone keypads, however, are less sophisticated – in part because collecting the record of people’s typing and creating a database from it would be a violation of everyone’s right to privacy.

The autocorrect dictionary gleans its words from a corpus of articles that are available in the public domain. Programmers have devised a course of analysis that pays attention to the way we organize our sentences, the prominence and repetition of any given word, spelling and possible transposition and, of course, the keyboard layout that makes hitting the wrong key all too easy.

That said, when you correct an autocorrected word, your phone learns the spelling you prefer. This is very common in proper nouns or created words, such as company jargon.

Where Big Data Comes In

Without big data to manage the volume of potential letter configurations, there’d be very little to talk about with regard to smart keyboards; yet, big data grants the smart keyboards even more promise than the tools they provide thus far. As the phone technology becomes able to store more information, the phone dictionaries will become not only larger, but also smarter.

As we move into the future, keyboard developers will use big data and machine learning to improve all kinds of keyboard-dependent and context-based functions for an improved experience across the (key)board.