This presentation is a high-level overview of the different types of training regimes you'll encounter as you move from single GPU to multi GPU to multi node distributed training. It describes where the computation happens, how the gradients are communicated, and how the models are updated and communicated.
The Lambda Deep Learning Blog
Subscribe
Categories
- gpu-cloud (29)
- tutorials (24)
- announcements (23)
- benchmarks (22)
- lambda cloud (17)
- NVIDIA H100 (16)
- hardware (12)
- gpus (9)
- tensorflow (9)
- NVIDIA A100 (8)
- gpu clusters (8)
- LLMs (7)
- company (7)
- deep learning (7)
- news (7)
- hyperplane (6)
- training (6)
- CNNs (4)
- NVIDIA GH200 (4)
- generative networks (4)
- machine learning (4)
- presentation (4)
- research (4)
- rtx a6000 (4)
- text generation (4)
Recent posts
OpenAI recently published a blog post on their GPT-2 language model. This tutorial shows you how to run the text generator code yourself.
Published 02/16/2019 by Stephen Balaban
A cost and speed comparison between the Lambda Hyperplane 8 V100 GPU Server and AWS p3 GPU instances. A very similar comparison to the DGX-1.
Published 02/11/2019 by Chuan Li
...