BERT is Google's SOTA pre-training language representations. This blog is about running BERT with multiple GPUs. Specifically, we will use the Horovod framework to parrallelize the tasks. We ill list all the changes to the original BERT implementation and highlight a few places that will make or break the performance.
The Lambda Deep Learning Blog
Voltron Data Case Study: Why ML teams are using Lambda Reserved Cloud Clusters
November 01, 2022
How to fine tune stable diffusion: how we made the text-to-pokemon model at Lambda
September 28, 2022
Lambda Cloud Storage is now in open beta: a high speed filesystem for our GPU instances
April 18, 2022
Lambda Teams Up With Razer to Launch the World’s Most Powerful Laptop for Deep Learning
April 11, 2022
You'll learn how to provide your team with GPU training infrastructure at a variety of scales, from a single shared multi-GPU system to a cluster for distributed training.