The Lambda Deep Learning Blog

Benchmarking ZeRO-Inference on the NVIDIA GH200 Grace Hopper Superchip

Benchmarks comparing inference performance of the NVIDIA GH200 Grace Hopper Superchip, enhanced by ZeRO-Inference, to NVIDIA H100 and A100 Tensor Core GPUs.

Published 12/20/2023 by Chuan Li

benchmarks NVIDIA A100 NVIDIA H100

Unleashing the power of Transformers with NVIDIA Transformer Engine

Benchmarks on NVIDIA’s Transformer Engine, which boosts FP8 performance by an impressive 60% on GPT3-style model testing on NVIDIA H100 Tensor Core GPUs.

Published 11/21/2023 by Chuan Li

DeepChat 3-Step Training At Scale: NVIDIA H100 SXM5 vs A100

benchmarks NVIDIA A100 NVIDIA H100

DeepChat 3-Step Training At Scale: Lambda’s Instances of NVIDIA H100 SXM5 vs A100 SXM4

GPU benchmarks on Lambda’s offering of the NVIDIA H100 SXM5 vs the NVIDIA A100 SXM4 using DeepChat’s 3-step training example.

Published 10/12/2023 by Chuan Li

NVIDIA H100 vs A100 Benchmarks for FlashAttention-2 on Lambda Cloud

benchmarks NVIDIA A100 NVIDIA H100 flashattention-2

How FlashAttention-2 Accelerates LLMs on NVIDIA H100 and A100 GPUs

How to use FlashAttention-2 on Lambda Cloud, including H100 vs A100 benchmark results for training GPT-3-style models using the new model.

Published 08/24/2023 by Chuan Li

NVIDIA A100 gpu-cloud gpu clusters

Voltron Data Case Study: Why ML teams are using Lambda Reserved Cloud Clusters

In this blog, we will outline the benefits of our new Reserved Cloud Cluster and an example of how Voltron Data is using it to work with large datasets.

Published 11/01/2022 by Lauren Watkins

hardware aws tco benchmarks hyperplane NVIDIA A100

Tesla A100 Server Total Cost of Ownership Analysis

This post discusses the Total Cost of Ownership (TCO) for a variety of Lambda A100 servers and clusters. We calculate the TCO for individual Hyperplane-A100 servers, compare the cost with renting a AWS p4d.24xlarge instance, and walk through the cost of building and operating A100 clusters.

Published 09/22/2021 by Chuan Li

hardware hpc BERT hyperplane infiniband NVIDIA A100 gpu clusters

Lambda Echelon – a turn key GPU cluster for your ML team

Introducing the Lambda Echelon, a GPU cluster designed for AI. It comes with the compute, storage, network, power, and support you need to tackle large scale deep learning tasks. Echelon offers a turn-key solution to faster training, faster hyperparameter search, and faster inference.

Published 10/06/2020 by Stephen Balaban

benchmarks gpus NVIDIA A100

NVIDIA A100 GPU Benchmarks for Deep Learning

Benchmarks for ResNet-152, Inception v3, Inception v4, VGG-16, AlexNet, SSD300, and ResNet-50 using the NVIDIA A100 GPU and DGX A100 server.

Published 05/22/2020 by Stephen Balaban

...

The Lambda Deep Learning Blog

Featured Posts

Introducing ML Times: your destination for digestible AI news and insights

Lambda selected as 2024 NVIDIA Partner Network AI Excellence Partner of the Year

Lambda among first NVIDIA Cloud Partners to deploy NVIDIA Blackwell-based GPUs

Lambda is a Diamond Sponsor at NVIDIA GTC!

Lambda Raises $320M to Build a GPU Cloud for AI

ShadeRunner: Chrome plugin for enhanced on-page research

Benchmarking ZeRO-Inference on the NVIDIA GH200 Grace Hopper Superchip

Persistent storage now available for on-demand NVIDIA H100 GPU instances

Lambda launches Vector One, a new single-GPU desktop PC

Unleashing the power of Transformers with NVIDIA Transformer Engine

Lambda Cloud Clusters to support NVIDIA H200 Tensor Core GPUs

Lambda Cloud Clusters now available with NVIDIA GH200 Grace Hopper Superchip

DeepChat 3-Step Training At Scale: Lambda’s Instances of NVIDIA H100 SXM5 vs A100 SXM4

Persistent storage for Lambda Cloud is expanding!

Exploring AI's Role in Summarizing Scientific Reviews

Categories

Recent Posts

Benchmarking ZeRO-Inference on the NVIDIA GH200 Grace Hopper Superchip

Unleashing the power of Transformers with NVIDIA Transformer Engine

DeepChat 3-Step Training At Scale: Lambda’s Instances of NVIDIA H100 SXM5 vs A100 SXM4

How FlashAttention-2 Accelerates LLMs on NVIDIA H100 and A100 GPUs

Voltron Data Case Study: Why ML teams are using Lambda Reserved Cloud Clusters

Tesla A100 Server Total Cost of Ownership Analysis

Lambda Echelon – a turn key GPU cluster for your ML team

NVIDIA A100 GPU Benchmarks for Deep Learning