The Lambda Deep Learning Blog

Featured Posts

Recent Posts

How To Use mpirun to Launch a LLAMA Inference Job Across Multiple Cloud Instances

How to use mpirun to launch a LLAMA inference job across multiple cloud instances if you do not have a multi-GPU workstation or server. Despite being more memory efficient than previous language foundation models, LLAMA still requires multiple GPUs to run inference with.

Published 03/14/2023 by Chuan Li

Hugging Face x Lambda: Whisper Fine-Tuning Event

Lambda and Hugging Face are collaborating on a 2-week sprint to fine-tune OpenAI's Whisper model in as many languages as possible.

Published 12/01/2022 by Chuan Li

NVIDIA GeForce RTX 4090 vs RTX 3090 Deep Learning Benchmark

In this blog post, we benchmark RTX 4090 to assess its deep learning training performance and compare its performance against RTX 3090, the flagship consumer GPU of the previous Ampere generation.

Published 10/31/2022 by Chuan Li

NVIDIA H100 GPU - Deep Learning Performance Analysis

Discuss the performance and scalability of H100 GPUs and the whys for upgrading your ML infrastructure with this upcoming big release from NVIDIA.

Published 10/05/2022 by Chuan Li

Multi node PyTorch Distributed Training Guide For People In A Hurry

The goal of this tutorial is to give a summary of how to write and launch PyTorch distributed data parallel jobs across multiple nodes, with working examples with the torch.distributed.launch, torchrun and mpirun APIs.

Published 08/26/2022 by Chuan Li

Setting Up A Kubernetes Run:AI Cluster on Lambda Cloud

This blog describes how to set up a RunAI cluster on Lambda Cloud with one or multiple cloud instances.

Published 06/03/2022 by Chuan Li

Best GPU for Deep Learning in 2022 (so far)

While waiting for NVIDIA's next-generation consumer and professional GPUs, we decided to write a blog about the best GPU for Deep Learning currently available, as of March 2022.

Published 02/28/2022 by Chuan Li

NVIDIA A40 Deep Learning Benchmarks

NVIDIA® A40 GPUs are now available on Lambda Scalar servers []. In this post, we benchmark the A40 with 48 GB of GDDR6 VRAM to assess its training performance using PyTorch and TensorFlow. We then compare it against the NVIDIA V100, RTX 8000, RTX 6000, and RTX 5000.

Published 11/30/2021 by Chuan Li

Tesla A100 Server Total Cost of Ownership Analysis

This post discusses the Total Cost of Ownership (TCO) for a variety of Lambda A100 servers and clusters. We first calculate the TCO for individual Hyperplane-A100 servers, and compare the cost with renting a AWS p4d.24xlarge instance which has the similar hardware and software set up. We then walk you through the cost of building and operating A100 clusters.

Published 09/22/2021 by Chuan Li

RTX A6000 vs RTX 3090 Deep Learning Benchmarks

PyTorch benchmarks of the RTX A6000 and RTX 3090 for convnets and language models - both 32-bit and mix precision performance.

Published 08/09/2021 by Chuan Li

OpenAI's GPT-3 Language Model: A Technical Overview

Chuan Li, PhD reviews GPT-3, the new NLP model from OpenAI. This paper empirically shows that language model performance scales as a power-law with model size, datataset size, and the amount of computation.

Published 06/03/2020 by Chuan Li

TensorFlow 2.0 Tutorial 01: Basic Image Classification

This tutorial explains the basics of TensorFlow 2.0 with image classification as the example. 1) Data pipeline with dataset API. 2) Train, evaluation, save and restore models with Keras. 3) Multiple-GPU with distributed strategy. 4) Customized training with callbacks

Published 10/01/2019 by Chuan Li

Setting up Horovod + Keras for Multi-GPU training

This blog will walk you through the steps of setting up a Horovod [] + Keras [] environment for multi-GPU training. Prerequisite * Hardware: A machine with at least two GPUs * Basic Software: Ubuntu (18.04 or 16.04), Nvidia Driver (418.43), CUDA (10.0)

Published 08/28/2019 by Chuan Li


Next page