Voice, the predominant interface of the future
Minus 12 degrees! I was standing at the bus stop and thought: “Thank you dear speech recognition. You make sure my hands don’t freeze to death!” If I remember a few years ago when I was standing at the bus stop in winter and had to type a message into my mobile phone. It was painful. Today I can just talk to my phone and it is able to recognize my spoken language and translate it into text.
Voice is king
Voice will be the predominant interface of the future, not only in winter. 😉 It is simply the easist way for humans to interact – with each other or with technology. It is so natural. Thanks to artificial intelligence that is driving the ability of machines or digital systems to understand our spoken words – and respond to them.
A bit of theoretical background
Natural language processing (NLP) enables machines and people to communicate with each other through natural language. The Basis of NLP is natural language understanding (NLU) – the real understanding of language. It is not only about understanding the meaning of the words, but also the grammar and context in which they were expressed.
Nowadays ANI has great problems in understanding metaphors. Statements such as “I’m hungry as a bear” reveal a certain knowledge of the world, namely that bears are animals and they eat a lot. It is easy for people of a certain age to understand such metaphors. For artificial narrow intelligence, on the other hand, it is not because ANI has no general knowledge. In other words, for an ANI it isn’t possible to respond to unplanned input. An artificial general intelligence, on the other hand, can deal with this and is able to understand metaphors, colloquial language or exaggerations.
In addition to capturing and understanding the input, it is just as important to reproduce the recorded and evaluated data in written or spoken words. Natural language generation (NLG) converts structured, analyzed data into comprehensible text.
Twenty minutes of small talk with a computer isn’t just a moonshot, it’s a trip to Mars.
“People are expecting Alexa to talk to them just like a friend,” says Ashwin Ram, who leads Alexa’s AI research team. Taking part in human conversation — with all its infinite variability, abrupt changes in context, and flashes of connection — is widely recognized as one of the hardest problems in AI.1
Above all the difficulty in a conversation with humans lies not only in identifying the meaning of spoken words, in colloquial language, or understanding the context. It lies in the missing goal. Machines are best when they can focus on a target. A small talk has no clear goal. It is just chatting.
While a real conversation still seems like the most difficult challenge for AI, it will be a matter of course in the future. With a General AI at the latest, we will be able to communicate with a machine as if it were a human being.