Thanks to artificial intelligence, traveling abroad has never been simpler.
The Google Translate app lets users translate text instantly. In the app, just point your camera at the text you want to translate and you’ll see it transform into your desired language live, right before your eyes—no Internet connection or cell phone data needed. This handy feature has been available for some time, but it had only been compatible with seven languages. Now, thanks to machine learning, Google has upgraded the app to instantly translate 27 languages.
“So the next time you’re in Prague and can’t read a menu, we’ve got your back,” Otavio Good, software engineer at Google, wrote on the company’s research blog.
Google also just used AI to cut their speech recognition errors in half.
As of today, in addition to translating between English, French, German, Italian, Portuguese, Russian and Spanish, the following 20 languages can be translated in real time as well: Bulgarian, Catalan, Croatian, Czech, Danish, Dutch, Filipino, Finnish, Hungarian, Indonesian, Lithuanian, Norwegian, Polish, Romanian, Slovak, Swedish, Turkish and Ukrainian. And if you opt to snap a picture instead of watching the text translate live, a total of 37 languages are supported.
So how was Google able to up the number of available languages? They first acquired Word Lens, formerly an augmented reality translation application, and used machine learning and convolutional neural networks to enhance the app’s capabilities. The advancements in image recognition were key.
“Five years ago, if you gave a computer an image of a cat or a dog, it had trouble telling which was which. Thanks to convolutional neural networks, not only can computers tell the difference between cats and dogs, they can even recognize different breeds of dogs,” Mr. Good said. “Yes, they’re good for more than just trippy art—if you’re translating a foreign menu or sign with the latest version of Google’s Translate app, you’re now using a deep neural net.”
Step by step
First, Translate must weed out background clutter and locate the text. When it locates “blobs of pixels” of the same color, it determines they’re letters. And when those blobs are close to each other, it understands it’s a continuous line to be read.
Next, the app must recognize what each individual letter is. This is where deep learning comes in.
“We use a convolutional neural network, training it on letters and non-letters so it can learn what different letters look like,” reads the blog post.
The researchers had to train the software using not just “clean-looking letters”, but “dirty” ones as well. “Letters out in the real world are marred by reflections, dirt, smudges, and all kinds of weirdness,” Mr. Good wrote. “So we built our letter generator to create all kinds of fake “dirt” to convincingly mimic the noisiness of the real world—fake reflections, fake smudges, fake weirdness all around.”
The third step is looking up the recognized letters in a dictionary to get the translations. And for an added attempt at accuracy, dictionary lookups are approximate in case an “S” is misread as a “5.”
Lastly, the translated text is rendered on top of the original in the same style.
“We can do this because we’ve already found and read the letters in the image, so we know exactly where they are. We can look at the colors surrounding the letters and use that to erase the original letters. And then we can draw the translation on top using the original foreground color,” the blog post reads.
In order to be as efficient as possible and allow all of these steps to be completed in real time without an Internet or data connection, the Google team developed a very small neural net with an upper bound on the density of information it can handle. Since they were generating their own training data, it was important to include the right data but nothing extra so the neural network isn’t using too much of its information density on unimportant things. An example would be how it needs to recognize a letter with a slight amount of rotation, but not too much.
In the end, users are left with 20 more languages but the same fast speed.