Shared posts

29 Jul 14:40

How Google Translate squeezes deep learning onto a phone

by Research Blog
Posted by Otavio Good, Software Engineer, Google Translate

Today we announced that the Google Translate app now does real-time visual translation of 20 more languages. So the next time you’re in Prague and can’t read a menu, we’ve got your back. But how are we able to recognize these new languages?

In short: deep neural nets. When the Word Lens team joined Google, we were excited for the opportunity to work with some of the leading researchers in deep learning. Neural nets have gotten a lot of attention in the last few years because they’ve set all kinds of records in image recognition. Five years ago, if you gave a computer an image of a cat or a dog, it had trouble telling which was which. Thanks to convolutional neural networks, not only can computers tell the difference between cats and dogs, they can even recognize different breeds of dogs. Yes, they’re good for more than just trippy art—if you're translating a foreign menu or sign with the latest version of Google's Translate app, you're now using a deep neural net. And the amazing part is it can all work on your phone, without an Internet connection. Here’s how.

Step by step

First, when a camera image comes in, the Google Translate app has to find the letters in the picture. It needs to weed out background objects like trees or cars, and pick up on the words we want translated. It looks at blobs of pixels that have similar color to each other that are also near other similar blobs of pixels. Those are possibly letters, and if they’re near each other, that makes a continuous line we should read.
Second, Translate has to recognize what each letter actually is. This is where deep learning comes in. We use a convolutional neural network, training it on letters and non-letters so it can learn what different letters look like.

But interestingly, if we train just on very “clean”-looking letters, we risk not understanding what real-life letters look like. Letters out in the real world are marred by reflections, dirt, smudges, and all kinds of weirdness. So we built our letter generator to create all kinds of fake “dirt” to convincingly mimic the noisiness of the real world—fake reflections, fake smudges, fake weirdness all around.

Why not just train on real-life photos of letters? Well, it’s tough to find enough examples in all the languages we need, and it’s harder to maintain the fine control over what examples we use when we’re aiming to train a really efficient, compact neural network. So it’s more effective to simulate the dirt.
Some of the “dirty” letters we use for training. Dirt, highlights, and rotation, but not too much because we don’t want to confuse our neural net.
The third step is to take those recognized letters, and look them up in a dictionary to get translations. Since every previous step could have failed in some way, the dictionary lookup needs to be approximate. That way, if we read an ‘S’ as a ‘5’, we’ll still be able to find the word ‘5uper’.

Finally, we render the translation on top of the original words in the same style as the original. We can do this because we’ve already found and read the letters in the image, so we know exactly where they are. We can look at the colors surrounding the letters and use that to erase the original letters. And then we can draw the translation on top using the original foreground color.

Crunching it down for mobile

Now, if we could do this visual translation in our data centers, it wouldn’t be too hard. But a lot of our users, especially those getting online for the very first time, have slow or intermittent network connections and smartphones starved for computing power. These low-end phones can be about 50 times slower than a good laptop—and a good laptop is already much slower than the data centers that typically run our image recognition systems. So how do we get visual translation on these phones, with no connection to the cloud, translating in real-time as the camera moves around?

We needed to develop a very small neural net, and put severe limits on how much we tried to teach it—in essence, put an upper bound on the density of information it handles. The challenge here was in creating the most effective training data. Since we’re generating our own training data, we put a lot of effort into including just the right data and nothing more. For instance, we want to be able to recognize a letter with a small amount of rotation, but not too much. If we overdo the rotation, the neural network will use too much of its information density on unimportant things. So we put effort into making tools that would give us a fast iteration time and good visualizations. Inside of a few minutes, we can change the algorithms for generating training data, generate it, retrain, and visualize. From there we can look at what kind of letters are failing and why. At one point, we were warping our training data too much, and ‘$’ started to be recognized as ‘S’. We were able to quickly identify that and adjust the warping parameters to fix the problem. It was like trying to paint a picture of letters that you’d see in real life with all their imperfections painted just perfectly.

To achieve real-time, we also heavily optimized and hand-tuned the math operations. That meant using the mobile processor’s SIMD instructions and tuning things like matrix multiplies to fit processing into all levels of cache memory.

In the end, we were able to get our networks to give us significantly better results while running about as fast as our old system—great for translating what you see around you on the fly. Sometimes new technology can seem very abstract, and it's not always obvious what the applications for things like convolutional neural nets could be. We think breaking down language barriers is one great use.
21 Aug 21:34

Hoch zu Ross – Tournee am Fahrrad 2014

by Georg Kostron

… wir kommen mit dem Fahrrad

Tonbandmaschine, Akkordeon, Gitarre, Lautsprecher, Mischpult, Ersatzunterhose, Apfel und Triangel – alles in die Radhänger! Im romantischen Passau werden wir an den Donaustrom andocken und an den Silhouetten der Linzer Stahlwerke, der Wachauer Kitschseligkeit und am heiligen Grant der Wiener Kaffeehauskellner vorbeifahren bis wir vor den Toren der Schwechater Raffinerie ankommen. 

350 Kilometer. 1000 Höhenmeter. 7 Konzerte. Unfassbar viele verbrannte Kalorien. Die CO2 Neutralität hängt vom Kraut- und Linsenkonsum ab. Vegan und gesellschaftskritsch fahren wir dem Weltfrieden entgegen – oder dem Weltuntergang. Das hängt von der Zeitung ab, die du liest. Wir Zwei, unsere Räder und ihre Anhänger können es Rennbahn nennen und sind nach den Vorbereitungen zwischen Studio, Proberaum und E-Maileingangsseite sehr glücklich jetzt bald das Wesentliche zu machen. Nämlich diese Tour zu spielen! Hü Hü Hott. Klapper Klapp! Galopp! Und weiter.

Sa, 13.09 – Kreuzweis | Passau/DE
Mo, 15.09 – Atelier Galerie Moran  | Linz/AT
in Coperformance mit Renate Moran
anschließend Weinverkostung der Loitothek von und mit Gerlinde & Josef Loitelsberger
Di, 16.09 – Smaragd | Linz/AT
Support: Marvellous Steps "Fingerstyle Guitar meets Beatbox"
Mi, 17.09 – Tratelier | Stockerau/AT
Do, 18.09 – Cafe Carina | Wien/AT
Fr, 19.09 – Tortuga Pub | Mödling/AT
Sa, 20.09 – Rock Pub | Schwechat/AT

Facebook Event der Tour

 

HochzuRoss_FylerWebBackexp2

Des Flyers Rückseite

 


19 Jul 11:10

Microsoft: 18.000 layoffs, but were they the right ones?

by CommitStrip

12 Jun 13:39

Trolltunga norway | Flickr - Photo Sharing!

by bergh
28 Apr 12:50

The latest chapter for the self-driving car: mastering city street driving

by Emily Wood
Jaywalking pedestrians. Cars lurching out of hidden driveways. Double-parked delivery trucks blocking your lane and your view. At a busy time of day, a typical city street can leave even experienced drivers sweaty-palmed and irritable. We all dream of a world in which city centers are freed of congestion from cars circling for parking (PDF) and have fewer intersections made dangerous by distracted drivers. That’s why over the last year we’ve shifted the focus of the Google self-driving car project onto mastering city street driving.
Since our last update, we’ve logged thousands of miles on the streets of our hometown of Mountain View, Calif. A mile of city driving is much more complex than a mile of freeway driving, with hundreds of different objects moving according to different rules of the road in a small area. We’ve improved our software so it can detect hundreds of distinct objects simultaneously—pedestrians, buses, a stop sign held up by a crossing guard, or a cyclist making gestures that indicate a possible turn. A self-driving vehicle can pay attention to all of these things in a way that a human physically can’t—and it never gets tired or distracted.

Here’s a video showing how our vehicle navigates some common scenarios near the Googleplex:

As it turns out, what looks chaotic and random on a city street to the human eye is actually fairly predictable to a computer. As we’ve encountered thousands of different situations, we’ve built software models of what to expect, from the likely (a car stopping at a red light) to the unlikely (blowing through it). We still have lots of problems to solve, including teaching the car to drive more streets in Mountain View before we tackle another town, but thousands of situations on city streets that would have stumped us two years ago can now be navigated autonomously.

Our vehicles have now logged nearly 700,000 autonomous miles, and with every passing mile we’re growing more optimistic that we’re heading toward an achievable goal—a vehicle that operates fully without human intervention.

Posted by Chris Urmson, Director, Self-Driving Car Project