Introducing the Lambda Echelon, a GPU cluster designed for AI. It comes with the compute, storage, network, power, and support you need to tackle large scale deep learning tasks. Echelon offers a turn-key solution to faster training, faster hyperparameter search, and faster inference.
The Lambda Deep Learning Blog
Categories
- gpu-cloud (25)
- tutorials (24)
- benchmarks (22)
- announcements (19)
- lambda cloud (13)
- NVIDIA H100 (12)
- hardware (12)
- tensorflow (9)
- NVIDIA A100 (8)
- gpus (8)
- company (7)
- LLMs (6)
- deep learning (6)
- hyperplane (6)
- news (6)
- training (6)
- gpu clusters (5)
- CNNs (4)
- generative networks (4)
- presentation (4)
- research (4)
- rtx a6000 (4)
Recent Posts
This presentation is a high-level overview of the different types of training regimes you'll encounter as you move from single GPU to multi GPU to multi node distributed training. It describes where the computation happens, how the gradients are communicated, and how the models are updated and communicated.
Published 05/31/2019 by Stephen Balaban
BERT is Google's SOTA pre-training language representations. This blog is about running BERT with multiple GPUs. Specifically, we will use the Horovod framework to parrallelize the tasks. We ill list all the changes to the original BERT implementation and highlight a few places that will make or break the performance.
Published 02/06/2019 by Chuan Li