Home > News content

Apple, Google, Microsoft, Amazon, which voice assistants have the most languages, WHY?

via:博客园     time:2019/2/4 14:04:16     readed:446

data-pre-sourced=yes

In September 2018, Vocalize. ai, an AI startup, conducted a test that compared Google, Apple and Amazon's smart voice assistants and found something interesting.

For example, all three voice assistants can recognize American and Indian accents well, but Siri and Alexa's accuracy in recognizing Chinese accents is greatly reduced.

data-pre-sourced=yes

For voice assistants, recognizing different accents in the same language is already a challenge, and learning a new language is even more difficult.

Samsung's Bixby, for example, won't increase its support for German, French, Italian and Spanish until this autumn, adding up to over 600 million users; Microsoft's Corona took many years to support Spanish, French and Portuguese.

Why is the development of voice assistant so slow today when AI has made great breakthroughs and developed rapidly? How can human beings strive to rebuild the Tower of Babel?

Why is it so difficult for a voice assistant to support a new voice?

Voice assistants have two major subjects to learn a language: voice recognition and voice synthesis.

Voice recognition is divided into two parts. The first step is speech recognition, which converts speech into text. The second step is semantic understanding. The technology involved is mainly natural language processing.

data-pre-sourced=yes

_Picture from: Electronicsweekly

This has been a tremendous progress. In the past, automatic speech processing (ASR) mainly relied on manually adjusted statistical models to calculate the probability of word combinations in phrases. The deep neural network not only reduced the error rate, but also avoided the need of artificial supervision to a large extent.

However, basic language understanding is far from enough, and localization is still a huge challenge. At present, according to the intention to be covered, it takes 30 to 90 days to build a query understanding module in the new language, according to a technician. As I said at the beginning, even recognizing the accent of the same language is a huge challenge.

Different languages are more different. For example, at the grammatical level, adjectives usually appear before nouns, while adverbs can be before or after nouns. For voice assistants, this can easily lead to confusion, such as the word "star fish". Speech-to-text engines can easily interpret "star" as an adjective of "fish".

data-pre-sourced=yes

After the speech is processed into words and understood, the voice assistant must also reply with the human voice.

Traditional speech synthesis technology mainly includes a synthesis engine and a pre-input voice database. The synthesis engine uses computer software to find matching pronunciation in the voice database and convert text into voice. However, this "artificial voice" is very incoherent and sounds unnatural. In order to cover more words, traditional voice databases are usually very large.

Nowadays, speech synthesis technology is called TTS (Text to Speech), which uses mathematical models to recreate sounds and then combines them into words and sentences. The latest TTS also introduces in-depth learning, which can become stronger and stronger in the process of "training".

At present, compared with speech recognition and semantic understanding, speech synthesis technology is much more mature. Major Internet companies in China often use voice synthesis technology in their operations.

Which languages do the major voice assistants support?

Google Assistant

Google's voice assistant supports the largest number of languages. It currently supports 30 languages in 80 countries, including:

  • Arabic (Egypt, Saudi Arabia)
  • Bengali
  • Chinese (Traditional)
  • Danish language
  • Dutch
  • English (Australia, Canada, India, Indonesia, Ireland, Philippines, Singapore, Thailand, UK, USA)
  • French (Canada, France)
  • German (Austria, Germany)
  • Gujarati
  • Hindi
  • Indonesian
  • Kannada
  • Italian
  • Japanese
  • Korean
  • Malay Language
  • Malathi
  • Norwegian
  • Polish language
  • Portuguese (Brazil)
  • Russian
  • Spanish (Argentina, Chile, Colombia, Peru)
  • Swedish language
  • Tamil language
  • Telugu
  • Thai
  • Turkish language
  • Uhl Du

Apple's SIRI

After being overtaken by Google Assistant in 2018, Siri currently ranks second in the number of languages supported. Including 21 languages from 36 countries:

  • Arabic
  • Chinese (Putonghua, Shanghai and Cantonese)
  • Danish language
  • Dutch
  • English?
  • Finnish language
  • French
  • German
  • Hebrew
  • Italian
  • Japanese
  • Korean
  • Malay Language
  • Norwegian
  • Portuguese
  • Russian
  • Spanish
  • Swedish language
  • Thai

Microsoft Cornata

  • Simplified Chinese
  • English (Australia, Canada, New Zealand, India, United Kingdom, United States)
  • French (Canada, France)
  • German
  • Italian
  • Japanese
  • Portuguese (Brazil)
  • Spanish (Mexico, Spain)

Alexa of Amazon

  • English (Australia, Canada, India, Britain and the United States)
  • French (Canada, France)
  • German
  • Japanese (Japan)
  • Spanish (Mexico, Spain)

Samsung's Bixby

  • English?
  • Chinese
  • German
  • French
  • Italian
  • Korean
  • Spanish

How will it develop in the future?

In the field of speech recognition, semantic understanding and speech synthesis, the main reason for their progress is the introduction of in-depth learning.

In the future, more reliance on machine learning may be of great help to the research of speech field.

data-pre-sourced=yes

_The legendary Tower of Babel was suspended because God disrupted the language of human beings.

This is just a research direction. However, in general, the use of massive real conversations as corpus for machine learning, rather than relying too much on artificially defined recognition models, can effectively help voice assistants become more "smart".

China IT News APP

Download China IT News APP

Please rate this news

The average score will be displayed after you score.

Post comment

Do not see clearly? Click for a new code.

User comments