The Lambda Deep Learning Blog

Featured Posts

Recent Posts

ShadeRunner: Chrome plugin for enhanced on-page research

Maximize online research efficiency with ShadeRunner, a Chrome plugin featuring text highlighting, paragraph summarization, and topic suggestions.

Published 02/13/2024 by David Hartmann

Setting Up A Kubernetes Run:AI Cluster on Lambda Cloud

This blog describes how to set up a Run:AI cluster on Lambda Cloud with one or multiple cloud instances.

Published 06/03/2022 by Chuan Li

Lambda Cloud Storage is now in open beta: a high speed filesystem for our GPU instances

After a period of closed beta, persistent storage for Lambda GPU Cloud is now available for all A6000 and V100 instances in an extended open beta period.

Published 04/19/2022 by Kathy Bui

Lambda's Deep Learning Curriculum

This curriculum provides an overview of free online resources for learning about deep learning. It includes courses, books, and even important people to follow.

Published 11/01/2021 by Stephen Balaban

NVIDIA NGC Tutorial: Run a PyTorch Docker Container using nvidia-container-toolkit on Ubuntu

This tutorial shows you how to install Docker with GPU support on Ubuntu Linux. To get GPU passthrough to work, you'll need docker, nvidia-container-toolkit, Lambda Stack, and a docker image with a GPU accelerated library.

Published 07/19/2021 by Stephen Balaban

How to Transfer Data to Lambda Cloud GPU Instances

This guide will walk you through how to load data from various sources onto your Lambda Cloud GPU instance. If you're looking for how to get started and SSH into your instance for the first time, check out our Getting Started Guide.

Published 05/03/2020 by Remy Guercio

Getting Started Guide — Lambda Cloud GPU Instances

This guide will walk you through the process of launching a Lambda Cloud GPU instance and using SSH to log in. For this guide we'll assume that you're running either Mac OSX or Linux.

Published 05/03/2020 by Remy Guercio

Setting up a Mellanox InfiniBand Switch (SB7800 36-port EDR)

This tutorial walks through the steps required to set up a Mellanox SB7800 36-port switch. The subnet manager discovers and configures the devices running on the InfiniBand fabric. This guide shows you how to set it up via the command line or via the web browser.

Published 10/30/2019 by Stephen Balaban

TensorFlow 2.0 Tutorial 01: Basic Image Classification

This tutorial explains the basics of TensorFlow 2.0 with image classification as the example. 1) Data pipeline with dataset API. 2) Train, evaluate, save and restore models with Keras. 3) Multiple-GPU with distributed strategy. 4) Customized training with callbacks.

Published 10/01/2019 by Chuan Li

Setting up Horovod + Keras for Multi-GPU training

This tutorial will walk you through how to setup a working environment for multi-GPU training with Horovod and Keras.

Published 08/28/2019 by Chuan Li

Tracking system resource (GPU, CPU, etc.) utilization during training with the Weights & Biases Dashboard

Resource utilization tracking can help machine learning engineers improve their software pipeline and model performance. This blog discusses how to use Weights & Biases to inspect the efficiency of TensorFlow training jobs.

Published 08/12/2019 by Chuan Li

TensorFlow 2.0 Tutorial 05: Distributed Training across Multiple Nodes

Distributed training allows scaling up deep learning tasks so bigger models can be learned from more extensive data. In this tutorial, we will explain how to do distributed training across multiple nodes.

Published 06/07/2019 by Chuan Li

TensorFlow 2.0 Tutorial 04: Early Stopping

This tutorial explains how early stopping is implemented in TensorFlow. The key lesson is to use tf.keras.EarlyStopping callback. Early stopping is triggered by monitoring if a certain quantity has improved over the latest period of time.

Published 06/06/2019 by Chuan Li

...

Next page