Toyota voice activation controls; photo by Chris Chase. Click image to enlarge
By Jim Kerr
Twenty-five years ago, Michael Knight used to talk with KITT (Knight Industries Two Thousand), his Trans Am sports car and crime-fighting partner in the TV show Knight Rider. The artificial intelligence programmed into KITT was science fiction and some of the stunts performed by KITT – or the people operating it – are still science fiction. However, spoken communications between car and driver are quickly becoming reality. There are cars we can speak to and have the radio channel changed or the volume turned down. Ford’s SYNC system can receive a Text message on your cell phone and speak it to you as you drive. Honda’s navigation system will listen to you as you either say a name or spell it out to input a destination in the car’s mapping system. These systems may operate seamlessly, but there is a lot going on behind the scenes before it all goes into production. Let’s take a look behind the scenes at the latest world system from Mercedes-Benz to see what it really takes to have a car speak with you.
Mercedes-Benz latest generation LINGUATRONIC voice-activated systems are found in their S-Class cars. Mercedes introduced voice controls in 1996 but it only operated the onboard telephone. By 2000, the systems were capable of controlling the car radio and CD-changer too, and by 2002, the navigation system could be operated by voice commands. The first systems used a 512-kb memory and had a storage capacity to recognize about 650 place names. Today the systems use more than 10 MB of memory and understand whole-word voice commands for more than 220,000 street names in the state of California alone! For the country of Germany, the system will recognize 80,000 town names and 470,000 street names in just a few milliseconds.
To recognize your spoken word, the system computer digitizes the sound, converts it into frequency ranges and then analyzes each frequency range. It is looking for characteristics known as “phonemes”, which are the smallest sound component of a language. By identifying these individual “phonemes” and then combining them back together, the computer can compare the digital pattern with the dictionary in its memory and identify your command, in only a few milliseconds. Each language has its own set of phenomes. For example, the German language uses around 40 phonemes and the system designers have programmed the Mercedes-Benz system to recognize not only German and English but also Spanish, French, Italian and Dutch. As well, the navigation system can “speak” more than a dozen different languages, including Russian, Chinese, Japanese, Portuguese and Turkish.
While the programmed system is very good, every person has a slightly different tone or pronunciation. To fine tune the LINGUATRONIC system so it will understand every phrase, the system has an “after-training” function, where the driver can have a personal conversation with the system.
When the system talks to you, typically you hear a female voice, but in Turkey most drivers prefer a male voice. Mercedes uses a team of 12 women and one man, each speaking one language to record all the feedback words, commands and sounds. These recording artists are chosen for their voice quality and many come from radio, television or have movie sound synchronization experience.
At the recording studio, each of the team members makes thousands of takes, saying everything in as neutral a voice as possible. The goal is to make it sound like the person is sitting next to you in the car. It takes about three days for each person to record all the individual words, commands, numbers and names you hear from the system. The vehicle computer determines an appropriate response to your commands and then joins the individually recorded words together into a phrase. When heard, it sounds like the person is talking in smooth phrases.
In the past 13 years we have progressed from a few simple voice commands that were often misunderstood by the computer system to a sophisticated system that can learn your own speech patterns and respond with phrases that sound like a conversation. We smiled at the artificial intelligence of the KITT car on TV but now a lot of it has become reality. We can only imagine what the next 13 years will bring.