The Lambda Deep Learning Blog

This presentation is a high-level overview of the different types of training regimes you'll encounter as you move from single GPU to multi GPU to multi node distributed training. It describes where the computation happens, how the gradients are communicated, and how the models are updated and communicated.

The Lambda Deep Learning Blog

Subscribe

Featured posts

Introducing Lambda 1-Click Clusters, a new way to train large AI models

Introducing ML Times: Your Destination For Digestible AI News And Insights

Lambda selected as 2024 NVIDIA Partner Network AI Excellence Partner of the Year

Lambda among first NVIDIA Cloud Partners to deploy NVIDIA Blackwell-based GPUs

Lambda is a Diamond Sponsor at NVIDIA GTC!

Lambda Raises $320M to Build a GPU Cloud for AI

ShadeRunner: Chrome plugin for enhanced on-page research

Benchmarking ZeRO-Inference on the NVIDIA GH200 Grace Hopper Superchip

Persistent storage now available for on-demand NVIDIA H100 GPU instances

Lambda launches Vector One, a new single-GPU desktop PC

Unleashing the power of Transformers with NVIDIA Transformer Engine

Lambda Cloud Clusters to support NVIDIA H200 Tensor Core GPUs

Lambda Cloud Clusters now available with NVIDIA GH200 Grace Hopper Superchip

DeepChat 3-Step Training At Scale: Lambda’s Instances of NVIDIA H100 SXM5 vs A100 SXM4

Persistent storage for Lambda Cloud is expanding!

Categories

Recent posts

A Gentle Introduction to Multi GPU and Multi Node Distributed Training