Google’s machine learning system TensorFlow is a go-to tool for researchers and developers all over the world. One used it to train a Deep-Q network to play Pong, and another used it to build a system that can dream up fake but plausible-looking Chinese characters.
Since Google made the system open-source in November, users have been begging for one improvement. Now, that wish has finally come true. Yesterday, Google released a version of TensorFlow that features distributed training, which will allow developers and researchers to run machine learning on more than one machine simultaneously. This will shorten the training process for some models from weeks to hours.
“This means researchers and developers can now do machine learning across dozens or even hundreds of machines, just as we do at Google. This was the top request from TensorFlow users, so it’s going to be a welcome announcement,” Jason Freidenfelds, a global communications representative for Google, told the Observer in an email.
Google has used this method across a wide range of its products to allow it to experiment with models of increasing size and sophistication. And to coincide with the TensorFlow 0.8 release, Google has also published a distributed trainer for the Inception image classification neural network in the TensorFlow models repository. Using the distributed trainer, the company trained the Inception network to 78 percent accuracy in less than 65 hours.