Lambda Echelon

GPU clusters designed for deep learning

Accelerate your team's AI progress with a Lambda Echelon HPC cluster.

lambda-echelon-hpc-clusters-rack-crate

Echelon clusters check all of the boxes

  • [✓] NVIDIA GPU Compute

    We use Lambda Scalar and Hyperplane servers with NVIDIA Tensor Core GPUs as the building blocks for your Echelon cluster. With an Echelon cluster, your team will be training in minutes instead of days.

  • [✓] Fast, All-Flash Storage

    Lambda-engineered network storage servers provide rapid access to your training and inference data. The Lambda storage administration dashboard makes managing your cluster a breeze.

  • [✓] HPC Networking

    Rely on Lambda's HPC engineers to design an optimal network topology for you. Whether you want Ethernet or InfiniBand, Echelon network designs leverage GPU Direct RDMA to accelerate multi-node distributed training and data access.

  • [✓] Enterprise White Glove Support

    Lambda Echelon comes racked, stacked, labeled, and cabled. But it doesn't stop there, they also come backed by engineers that love to go above and beyond. Each cluster comes with access to the expertise of the engineers who designed it. We abstract away the complexity of HPC clusters so you can focus on what you do best. Just roll it out of the crate, plug it in, and start training.

lambda single vendor illustration
All-in-one Rack Level Solution

A single vendor for your entire cluster

  • One relationship to rule them all

    Working with Lambda means you won't be duct-taping a solution together from multiple vendors. Your cluster’s compute, network, and storage are all provided by Lambda, meaning your procurement process is greatly simplified.

  • Compute, Storage, and Networking that works together

    Echelon clusters have compute, storage, and networking architectures that are validated by Lambda HPC engineers and detailed in our whitepaper.

  • Shipped to you, ready to roll

    Echelon clusters can be shipped to you fully assembled, ready to roll out of their rack crate and onto the floor of your data center. All you need to do is plug it in.

a100 compute illustration
Compute

A cluster of Lambda GPU servers

  • Designed for your team’s use case

    Lambda is a team of HPC experts and published AI researchers. Whether you’re looking to set up a traditional HPC cluster, or a cluster for distributed training of language models, we’ll engineer a cluster for your precise use case.

  • Endlessly customizable

    The Echelon cluster design process begins with a Lambda Hyperplane or Lambda Scalar server configuration. This becomes the core compute node used throughout the cluster.

flash storage illustration
Storage

All-Flash Network Storage with Management Dashboard

  • Access your data sets and checkpoints at the speed of flash

    Storage clusters can often become the bottleneck in large scale deployments. The Echelon reference design combines a high speed storage fabric with local NVMe flash caches to dramatically speed up data transfer rates during training.

  • Support for dozens of vendors

    Lambda has pre-existing OEM relationships with practically every storage appliance provider in the world.

  • Manage your storage cluster with an easy-to-use web dashboard

    Manage your cluster’s storage via the Lambda storage management dashboard. You can easily create and destroy network attached storage volumes, spin up virtual machines on your storage devices, and manage access control.

networking illustration
Networking

Blazing fast HPC networking

  • 100% port-to-port bandwidth spine & leaf topology

    Echelon compute nodes communicate via a 200 Gb/s InfiniBand fabric. Each node has eight 200 Gb/s HDR InfiniBand HCAs, providing a theoretical peak node-to-node bandwidth of over 200 gigabytes per second.

cluster support illustration
Support

World class support

  • Get phone support directly from an AI infrastructure engineer

    Having successfully deployed thousands of nodes, Lambda’s team consists of seasoned HPC experts. When you need help with your cluster, we’ll be there for you.

  • Support that covers your entire cluster

    Lambda Echelon support doesn’t just stop at the hardware. Our Premium and Max support tiers provide end-to-end cluster support. Whether you have a hardware, software, or Linux system administration question, we’ll be there to help.

industry defining tco illustration
AI Economics

A fraction of the cost of cloud

  • Industry Defining TCO

    If you're a heavy GPU cloud compute customer, say goodbye to your monthly AWS bill. With Echelon, you can expect a TCO of anywhere from one half to one fifth of what you’re paying on AWS.

Read the Echelon Whitepaper

Learn more about the Lambda Echelon HPC cluster reference design.

lambda-echelon-hpc-cluster-whitepaper
lambda-echelon-cluster-components

Deep learning infrastructure for your data center

  • Multi-node distributed training

    Echelon comes with an optimized network topology. Whether you need a single switch, or a three-tier non-blocking fat tree, our network engineers have it covered.

  • A one stop data center shop

    Each cluster comes with all of the components racked up, plugged in, and properly labeled. It's shipped to you in a secured rack crate with integrated ramp for easy installation into your data center or co-location facility.

  • Engineered for you

    Leverage Lambda engineering to design an Echelon cluster that's tailored to your specific deep learning workload.