GPU clusters designed for deep learning
Accelerate your team's AI progress with a Lambda Echelon HPC cluster.
Echelon clusters check all of the boxes
[✓] NVIDIA GPU Compute
We use Lambda Scalar and Hyperplane servers with NVIDIA Tensor Core GPUs as the building blocks for your Echelon cluster. With an Echelon cluster, your team will be training in minutes instead of days.
[✓] Fast, All-Flash Storage
Lambda-engineered network storage servers provide rapid access to your training and inference data. The Lambda storage administration dashboard makes managing your cluster a breeze.
[✓] HPC Networking
Rely on Lambda's HPC engineers to design an optimal network topology for you. Whether you want Ethernet or InfiniBand, Echelon network designs leverage GPU Direct RDMA to accelerate multi-node distributed training and data access.
[✓] Enterprise White Glove Support
Lambda Echelon comes racked, stacked, labeled, and cabled. But it doesn't stop there, they also come backed by engineers that love to go above and beyond. Each cluster comes with access to the expertise of the engineers who designed it. We abstract away the complexity of HPC clusters so you can focus on what you do best. Just roll it out of the crate, plug it in, and start training.
A single vendor for your entire cluster
One relationship to rule them all
Working with Lambda means you won't be duct-taping a solution together from multiple vendors. Your cluster’s compute, network, and storage are all provided by Lambda, meaning your procurement process is greatly simplified.
Compute, Storage, and Networking that works together
Echelon clusters have compute, storage, and networking architectures that are validated by Lambda HPC engineers and detailed in our whitepaper.
Shipped to you, ready to roll
Echelon clusters can be shipped to you fully assembled, ready to roll out of their rack crate and onto the floor of your data center. All you need to do is plug it in.
A cluster of Lambda GPU servers
Designed for your team’s use case
Lambda is a team of HPC experts and published AI researchers. Whether you’re looking to set up a traditional HPC cluster, or a cluster for distributed training of language models, we’ll engineer a cluster for your precise use case.
The Echelon cluster design process begins with a Lambda Hyperplane or Lambda Scalar server configuration. This becomes the core compute node used throughout the cluster.
All-Flash Network Storage with Management Dashboard
Access your data sets and checkpoints at the speed of flash
Storage clusters can often become the bottleneck in large scale deployments. The Echelon reference design combines a high speed storage fabric with local NVMe flash caches to dramatically speed up data transfer rates during training.
Support for dozens of vendors
Lambda has pre-existing OEM relationships with practically every storage appliance provider in the world.
Manage your storage cluster with an easy-to-use web dashboard
Manage your cluster’s storage via the Lambda storage management dashboard. You can easily create and destroy network attached storage volumes, spin up virtual machines on your storage devices, and manage access control.
Blazing fast HPC networking
100% port-to-port bandwidth spine & leaf topology
Echelon compute nodes communicate via a 200 Gb/s InfiniBand fabric. Each node has eight 200 Gb/s HDR InfiniBand HCAs, providing a theoretical peak node-to-node bandwidth of over 200 gigabytes per second.
World class support
Get phone support directly from an AI infrastructure engineer
Having successfully deployed thousands of nodes, Lambda’s team consists of seasoned HPC experts. When you need help with your cluster, we’ll be there for you.
Support that covers your entire cluster
Lambda Echelon support doesn’t just stop at the hardware. Our Premium and Max support tiers provide end-to-end cluster support. Whether you have a hardware, software, or Linux system administration question, we’ll be there to help.
A fraction of the cost of cloud
Industry Defining TCO
If you're a heavy GPU cloud compute customer, say goodbye to your monthly AWS bill. With Echelon, you can expect a TCO of anywhere from one half to one fifth of what you’re paying on AWS.
Read the Echelon Whitepaper
Learn more about the Lambda Echelon HPC cluster reference design.
Deep learning infrastructure for your data center
Multi-node distributed training
Echelon comes with an optimized network topology. Whether you need a single switch, or a three-tier non-blocking fat tree, our network engineers have it covered.
A one stop data center shop
Each cluster comes with all of the components racked up, plugged in, and properly labeled. It's shipped to you in a secured rack crate with integrated ramp for easy installation into your data center or co-location facility.
Engineered for you
Leverage Lambda engineering to design an Echelon cluster that's tailored to your specific deep learning workload.