The Lambda Deep Learning Blog

Featured Posts

Recent Posts

Lambda Raises $320M to Build a GPU Cloud for AI

Lambda raised a $320M Series C for a $1.5B valuation, to expand our GPU cloud & further our mission to build the #1 AI compute platform in the world.

Published 02/15/2024 by Stephen Balaban

Lambda Raises $44M to Build the World’s Best Cloud for Training AI

Lambda secured a $44 million Series B to accelerate the growth of our AI cloud. Funds will be used to deploy new H100 GPU capacity with high-speed network interconnects and develop features that will make Lambda the best cloud in the world for training AI.

Published 03/21/2023 by Stephen Balaban

Careers at Lambda

Lambda is hiring! Join a fast growing startup providing deep learning hardware, software, and cloud services to the world's leading companies.

Published 06/26/2022 by Stephen Balaban

Lambda's Machine Learning Infrastructure Playbook and Best Practices

If you're trying to figure out how to build and scale your team's deep learning infrastructure, this presentation is for you. We walk you through the decisions associated with building cloud, on-prem, and hybrid infrastructure for your team.

Published 02/23/2022 by Stephen Balaban

Deep learning is the future of gaming.

Deep learning is the most important technology to impact gaming since the advent of 3D graphics. This short video presentation walks you through a few of the technologies that will deliver unbelievable gaming experiences in the near future.

Published 01/04/2022 by Stephen Balaban

Lambda's Deep Learning Curriculum

This curriculum provides an overview of free online resources for learning about deep learning. It includes courses, books, and even important people to follow.

Published 11/01/2021 by Stephen Balaban

NVIDIA NGC Tutorial: Run a PyTorch Docker Container using nvidia-container-toolkit on Ubuntu

This tutorial shows you how to install Docker with GPU support on Ubuntu Linux. To get GPU passthrough to work, you'll need docker, nvidia-container-toolkit, Lambda Stack, and a docker image with a GPU accelerated library.

Published 07/19/2021 by Stephen Balaban

Lambda raises $24.5M to build GPU cloud and deep learning hardware

Lambda secured $24.5M in financing, including a $15M Series A equity round and a $9.5M debt facility that will allow for the growth of Lambda GPU Cloud and the expansion of Lambda's on-prem AI infrastructure software products. Read more details in the post.

Published 07/16/2021 by Stephen Balaban

Lambda Echelon – a turn key GPU cluster for your ML team

Introducing the Lambda Echelon, a GPU cluster designed for AI. It comes with the compute, storage, network, power, and support you need to tackle large scale deep learning tasks. Echelon offers a turn-key solution to faster training, faster hyperparameter search, and faster inference.

Published 10/06/2020 by Stephen Balaban

NVIDIA A100 GPU Benchmarks for Deep Learning

Benchmarks for ResNet-152, Inception v3, Inception v4, VGG-16, AlexNet, SSD300, and ResNet-50 using the NVIDIA A100 GPU and DGX A100 server.

Published 05/22/2020 by Stephen Balaban

Hyperplane-16 InfiniBand Cluster Total Cost of Ownership Analysis

This post uses our Total Cost of Ownership (TCO) calculator to examine the cost of a variety of Lambda Hyperplane-16 clusters. We have the option to include 100 Gb/s EDR InfiniBand networking, storage servers, and complete rack-stack-label-cable service.

Published 04/07/2020 by Stephen Balaban

Setting up a Mellanox InfiniBand Switch (SB7800 36-port EDR)

This tutorial walks through the steps required to set up a Mellanox SB7800 36-port switch. The subnet manager discovers and configures the devices running on the InfiniBand fabric. This guide shows you how to set it up via the command line or via the web browser.

Published 10/30/2019 by Stephen Balaban

A Gentle Introduction to Multi GPU and Multi Node Distributed Training

This presentation is a high-level overview of the different types of training regimes you'll encounter as you move from single GPU to multi GPU to multi node distributed training. It describes where the computation happens, how the gradients are communicated, and how the models are updated and communicated.

Published 05/31/2019 by Stephen Balaban

...

Next page