The Lambda Deep Learning Blog

Featured Posts

Recent Posts

Benchmarking ZeRO-Inference on the NVIDIA GH200 Grace Hopper Superchip

Benchmarks comparing inference performance of the NVIDIA GH200 Grace Hopper Superchip, enhanced by ZeRO-Inference, to NVIDIA H100 and A100 Tensor Core GPUs.

Published 12/20/2023 by Chuan Li

Unleashing the power of Transformers with NVIDIA Transformer Engine

Benchmarks on NVIDIA’s Transformer Engine, which boosts FP8 performance by an impressive 60% on GPT3-style model testing on NVIDIA H100 Tensor Core GPUs.

Published 11/21/2023 by Chuan Li

DeepChat 3-Step Training At Scale: Lambda’s Instances of NVIDIA H100 SXM5 vs A100 SXM4

GPU benchmarks on Lambda’s offering of the NVIDIA H100 SXM5 vs the NVIDIA A100 SXM4 using DeepChat’s 3-step training example.

Published 10/12/2023 by Chuan Li

How FlashAttention-2 Accelerates LLMs on NVIDIA H100 and A100 GPUs

How to use FlashAttention-2 on Lambda Cloud, including H100 vs A100 benchmark results for training GPT-3-style models using the new model.

Published 08/24/2023 by Chuan Li

How To Use mpirun to Launch a LLaMA Inference Job Across Multiple Cloud Instances

Learn how to use mpirun to launch a LLaMA inference job across multiple cloud instances if you do not have a multi-GPU workstation or server.

Published 03/14/2023 by Chuan Li

Hugging Face x Lambda: Whisper Fine-Tuning Event

Lambda and Hugging Face are collaborating on a 2-week sprint to fine-tune OpenAI's Whisper model in as many languages as possible.

Published 12/01/2022 by Chuan Li

NVIDIA GeForce RTX 4090 vs RTX 3090 Deep Learning Benchmark

RTX 4090 vs RTX 3090 benchmarks to assess deep learning training performance, including training throughput/$, throughput/watt, and multi-GPU scaling.

Published 10/31/2022 by Chuan Li

NVIDIA H100 Tensor Core GPU - Deep Learning Performance Analysis

This article discusses the performance and scalability of H100 GPUs and the whys for upgrading your ML infrastructure with the H100 release from NVIDIA.

Published 10/05/2022 by Chuan Li

Multi node PyTorch Distributed Training Guide For People In A Hurry

This tutorial summarizes how to write and launch PyTorch distributed data parallel jobs across multiple nodes, with working examples with the torch.distributed.launch, torchrun and mpirun APIs.

Published 08/26/2022 by Chuan Li

Setting Up A Kubernetes Run:AI Cluster on Lambda Cloud

This blog describes how to set up a Run:AI cluster on Lambda Cloud with one or multiple cloud instances.

Published 06/03/2022 by Chuan Li

Best GPU for Deep Learning in 2022 (so far)

While waiting for NVIDIA's next-generation consumer & professional GPUs, here are the best GPUs for Deep Learning currently available as of March 2022.

Published 02/28/2022 by Chuan Li

NVIDIA A40 Deep Learning Benchmarks

GPU benchmarks on NVIDIA A40 GPUs with 48 GB of GDDR6 VRAM, including performance comparisons to the NVIDIA V100, RTX 8000, RTX 6000, and RTX 5000.

Published 11/30/2021 by Chuan Li

Tesla A100 Server Total Cost of Ownership Analysis

This post discusses the Total Cost of Ownership (TCO) for a variety of Lambda A100 servers and clusters. We calculate the TCO for individual Hyperplane-A100 servers, compare the cost with renting a AWS p4d.24xlarge instance, and walk through the cost of building and operating A100 clusters.

Published 09/22/2021 by Chuan Li

...

Next page