By CGTN's Yang Chengxi
The use of AI promises transformative effects on various sectors. For us, the most common usage of AI is voice assistant services. But according to world’s top experts at the Global Artificial Intelligence and Robotics Summit, voice AI still has a long way to go.
A decade after the birth of Apple's Siri, the world's major tech firms are getting more aggressive on AI products based on voice interaction. In July, both Alibaba and Baidu, two of China's biggest tech companies, showed off their latest voice AI products and services.
"We are on the road of the fast generation where machines can recognize our voice and do useful things. We are impressed by that,” said Subbarao Kambhampati, president of the American Association of Artificial Intelligence.
Four key steps in voice AI. /CGTN Photo
Experts say that the principle behind voice AI comes down to four key parts.
"Speech recognition, language understanding, context knowledge and also search," said Zhang Hongjiang, a technology investor.
"So what we see today, most of the so-called AI systems, we talk about Amazon's echo and other, those are the products that involved the technologies I just mentioned,” he added.
Echo, Amazon's AI assistant. /VCG Photo
Zhang said previous breakthroughs had been focused on speech recognition and language understanding. But context understanding is the hardest part. Today's machines are so good at recognizing the human language, but experts say there's a substantial difference between recognizing speech, and truly understanding the social and emotional context of it.
Today voice assistants can only handle simple questions with confined parameters, like "what's the weather like in Shenzhen today."
"You asked weather, you asked Shenzhen, you asked today, and then using these three key words, it does a search. What it returns to you is weather forecast. That is entirely different than understanding,” said Zheng.
Interface of Cortana, an AI assistant developed by Microsoft. /VCG Photo
That's keyword recognition plus Internet search. Anything beyond that, Zhang said, current voice AI technologies are still prone to misinterpreting the commands. Enhancing context understanding will be a key challenge for tech firms.
"If you can master that, that would be great, but I think that's a huge threshold. I would believe that it will happen, I am an optimist, but I don't think it's gonna happen tomorrow, I don't think it's gonna happen very soon,” said Kambhampati.