The goal of this tutorial is to give a summary of how to write and launch PyTorch distributed data parallel jobs across multiple nodes, with working examples with the torch.distributed.launch, torchrun and mpirun APIs.
The Lambda Deep Learning Blog
Featured Posts
Recent Posts
How Lambda Cloud can save a Machine Learning Engineer time and money to train state of the art YoloV5 object detection models.
Published 08/15/2022 by Cooper L
Lambda is hiring! Join a fast growing startup providing deep learning hardware, software, and cloud services to the world's leading companies.
Published 06/26/2022 by Stephen Balaban
This blog describes how to set up a RunAI cluster on Lambda Cloud with one or multiple cloud instances.
Published 06/03/2022 by Chuan Li
After a period of closed beta, persistent storage for Lambda GPU Cloud is now available for all A6000 and V100 instances in an extended open beta period.
Published 04/19/2022 by Kathy Bui
New laptop offers the industry’s most powerful mobile workstation for deep learning, enabling ML engineers to immediately focus on achieving breakthroughs in AI/ML anytime, anywhere.
Published 04/12/2022 by Rick
Lambda has been selected as an NVIDIA Partner Network (NPN) Solutions Integration Partner of the Year for 2021, the second consecutive year the deep learning infrastructure provider has been chosen for this top honor.
Published 04/05/2022 by Rick
The best tools for monitoring your GPU usage and performance statistics compared.
Published 03/29/2022 by Justin Pinkney
While waiting for NVIDIA's next-generation consumer and professional GPUs, we decided to write a blog about the best GPU for Deep Learning currently available, as of March 2022.
Published 02/28/2022 by Chuan Li
If you're trying to figure out how to build and scale your team's deep learning infrastructure, this presentation is for you. We walk you through the decisions associated with building cloud, on-prem, and hybrid infrastructure for your team. We've distilled best practices learned from helping thousands of teams build their
Published 02/23/2022 by Stephen Balaban
Deep learning is the most important technology to impact gaming since the advent of 3D graphics. This short video presentation walks you through a few of the technologies that will deliver unbelievable gaming experiences in the near future. Research covered in this presentation: 1. Photorealistic neural rendering 2. Deepfakes for
Published 01/04/2022 by Stephen Balaban
Today, we will show how to install Anaconda and how to use YAML files for versioning environments. Anaconda is a distribution of the Python for machine learning and data science that simplifies package management and deployment. It is an invaluable tool for controlling the versioning of packages in your code
Published 12/31/2021 by Mark Dalton
NVIDIA® A40 GPUs are now available on Lambda Scalar servers [https://lambdalabs.com/products/scalar]. In this post, we benchmark the A40 with 48 GB of GDDR6 VRAM to assess its training performance using PyTorch and TensorFlow. We then compare it against the NVIDIA V100, RTX 8000, RTX 6000, and RTX 5000.
Published 11/30/2021 by Chuan Li