PyTorch benchmarks of the RTX A6000 and RTX 3090 for convnets and language models - both 32-bit and mix precision performance.
The Lambda Deep Learning Blog
Categories
- gpu-cloud (25)
- tutorials (24)
- benchmarks (22)
- announcements (19)
- lambda cloud (13)
- NVIDIA H100 (12)
- hardware (12)
- tensorflow (9)
- NVIDIA A100 (8)
- gpus (8)
- company (7)
- LLMs (6)
- deep learning (6)
- hyperplane (6)
- news (6)
- training (6)
- gpu clusters (5)
- CNNs (4)
- generative networks (4)
- presentation (4)
- research (4)
- rtx a6000 (4)
Recent Posts
Chuan Li, PhD reviews GPT-3, the new NLP model from OpenAI. The technical overview covers how GPT-3 was trained, GPT-2 vs. GPT-3, and GPT-3 performance.
Published 06/03/2020 by Chuan Li
Scaling out deep learning infrastructure becomes easier with 16 NVIDIA Tesla V100 GPUs and preinstalled frameworks like TensorFlow, Keras, and PyTorch.
Published 12/19/2019 by Chuan Li
This tutorial explains the basics of TensorFlow 2.0 with image classification as the example. 1) Data pipeline with dataset API. 2) Train, evaluate, save and restore models with Keras. 3) Multiple-GPU with distributed strategy. 4) Customized training with callbacks.
Published 10/01/2019 by Chuan Li
This tutorial will walk you through how to setup a working environment for multi-GPU training with Horovod and Keras.
Published 08/28/2019 by Chuan Li
Resource utilization tracking can help machine learning engineers improve their software pipeline and model performance. This blog discusses how to use Weights & Biases to inspect the efficiency of TensorFlow training jobs.
Published 08/12/2019 by Chuan Li
Distributed training allows scaling up deep learning tasks so bigger models can be learned from more extensive data. In this tutorial, we will explain how to do distributed training across multiple nodes.
Published 06/07/2019 by Chuan Li
This tutorial explains how early stopping is implemented in TensorFlow. The key lesson is to use tf.keras.EarlyStopping callback. Early stopping is triggered by monitoring if a certain quantity has improved over the latest period of time.
Published 06/06/2019 by Chuan Li
This tutorial explains how to use checkpoints to save and restore TensorFlow models during the training. The key is to use tf.kears.ModelCheckpoint callbacks to save the model. Set initial_epoch in the model.fit call to restore the model from a pre-saved checkpoint.
Published 06/06/2019 by Chuan Li
This tutorial explains how to do transfer learning with TensorFlow 2. We cover handling customized datasets, restoring backbone with Keras's application API, and restoring backbone from the disk.
Published 06/05/2019 by Chuan Li
A cost and speed comparison between the Lambda Hyperplane 8 V100 GPU Server and AWS p3 GPU instances. A very similar comparison to the DGX-1.
Published 02/11/2019 by Chuan Li
This tutorial is about making a character-based text generator using a simple two-layer LSTM. It will walk you through the data preparation and the network architecture. TensorFlow implementation is available at the end of the tutorial.
Published 02/08/2019 by Chuan Li
BERT is Google's SOTA pre-training language representations. This blog is about running BERT with multiple GPUs. Specifically, we will use the Horovod framework to parrallelize the tasks. We ill list all the changes to the original BERT implementation and highlight a few places that will make or break the performance.
Published 02/06/2019 by Chuan Li