[[ build.model.nick ]]

Choose the price, not the parts. Each model is built with the GPUs, CPU, RAM, and storage that maximizes Deep Learning performance per dollar.

[[ build.model.image_alt ]]

[[ build.model.nick ]]

Basic

OS Ubuntu 18.04 + Lambda Stack
GPUs 8x NVIDIA RTX 2080
CPU 2x Intel Xeon E5-2650 v4
Memory 128 GB memory
STORAGE 2 TB SATA SSD (OS install)
EXTRA 4 TB HDD
NETWORK 10 Gbps ethernet

Premium

OS Ubuntu 18.04 + Lambda Stack
GPUs 8x NVIDIA RTX 2080 Ti
CPU 2x Intel Xeon E5-2650 v4
Memory 256 GB memory
STORAGE 2 TB SATA SSD (OS install)
EXTRA 4 TB HDD
NETWORK 10 Gbps ethernet

Max

OS Ubuntu 18.04 + Lambda Stack
GPUs 8x NVIDIA RTX 2080 Ti
CPU 2x Intel Xeon E5-2697 v4
Memory 512 GB memory
STORAGE 2 TB NVME SSD (OS Install)
EXTRA 4 TB HDD
NETWORK 10 Gbps ethernet

Customize

Not seeing what you want?

Add a Protection Plan

3-year protection for [[ getWarrantyPrice(true) ]]

[[ getSubtotal(true)]]
Talk to an engineer
(650) 479-5530

About the [[ build.model.title ]] Basic

GPUs

GPUs are the most critical piece of hardware for Deep Learning. The Lambda Blade Basic has 8x NVIDIA RTX 2080 GPUs (Turing Architecture). Each RTX 2080 has 10.1 TFLOPs of FP32 performance (the standard precision for Deep Learning training) and 8 GB of vRAM. Our benchmarks show that the 2080 is the same speed as the previous generation GTX 1080 Ti.

Processor

During training, the CPUs preprocess data and feed it to the GPUs. Slow processors will cause the GPUs to waste cycles waiting for this data. Core count and PCI-e lane count are important CPU performance factors. More cores means faster data preprocessing; more PCI-e lanes means faster transmission of that data to the GPUs. The Basic has two Intel Xeon E5-2650 v4 (12 cores, 40x PCI-e lanes, each). Its core-to-GPU ratio is 3, which follows the best practice of at least 1 CPU core per GPU. The Basic's CPUs, combined with its PLX-enabled motherboard, provide 16x PCI-e lanes to each GPU (the max possible).

Motherboard

A motherboard's PCI-e topology significantly impacts Deep Learning performance. PCI-e lanes are data pipes that enable communication amongst the GPUs and CPU. The number of PCI-e lanes attached to a given device can range from 1 to 16. More lanes is better: for example, a device with 16 PCI-e lanes can send data faster than a device with 4. When training a neural net, the GPUs and CPU send huge amounts of data to each other. To ensure speedy communication, the Basic's motherboard provides each GPU with 16x PCI-e lanes, which is the highest of any motherboard as of 2018.

Memory

A Deep Learning computer should have at least as much RAM as GPU memory. For example, a machine with 8x NVIDIA RTX 2080 Ti GPUs should have at least 88 GB of memory (2080 Tis have 11 GB of memory each). The Basic has 8x 2080 Ti GPUs and 128 GB of memory, so it follows this rule of thumb. If you work with large data sets (e.g. many large images), consider upgrading to Premium, which has 256 GB of memory.

Storage

Most datasets do not fit in RAM. In such cases, during model training, subsets must be repeatedly swapped in and out of RAM from nonvolatile storage. Such a pipeline requires fast, solid state storage; without it, the GPUs would waste cycles waiting for their next batch of data. The Basic was designed with this constraint in mind; it has two nonvolatile storage devices: a 2 TB solid state drive (fast) for data you're training on now, and a 4 TB hard disk drive (slower) for everything else. Files located in the /data directory are stored on the HDD; all other files are stored on the SSD.

Network

The Basic has 10 Gbps ethernet. You're ISP will almost certainly be the bottleneck. The main benefit of 10 Gbps ethernet (as opposed to the standard 1 Gbps) is fast file transfers between the computers your network. Multi-node distributed training requires at least 40 Gbps (Infiniband territory).

Who bought a Basic?

About the [[ build.model.title ]] Premium

GPUs

GPUs are the most critical piece of hardware for Deep Learning. The Premium has 8x NVIDIA RTX 2080 Ti GPUs (Turing Architecture). Each RTX 2080 Ti has 13.4 TFLOPs of FP32 performance (the standard precision for Deep Learning training). Our benchmarks show that the 2080 Ti is approximately 30% faster than the previous generation GTX 1080 Ti.

Processor

During training, the CPUs preprocess data and feed it to the GPUs. Slow processors will cause the GPUs to waste cycles waiting for this data. Core count and PCI-e lane count are important CPU performance factors. More cores means faster data preprocessing; more PCI-e lanes means faster transmission of that data to the GPUs. The Premium has two Intel Xeon E5-2650 v4 (12 cores, 40x PCI-e lanes, each). Its core-to-GPU ratio is 3, which follows the best practice of at least 1 CPU core per GPU. The Premium's CPUs, combined with its PLX-enabled motherboard, provide 16x PCI-e lanes to each GPU (the max possible).

Motherboard

A motherboard's PCI-e topology significantly impacts Deep Learning performance. PCI-e lanes are data pipes that enable communication amongst the GPUs and CPU. The number of PCI-e lanes attached to a given device can range from 1 to 16. More lanes is better: for example, a device with 16 PCI-e lanes can send data faster than a device with 4. When training a neural net, the GPUs and CPU send huge amounts of data to each other. To ensure speedy communication, the Premium's motherboard provides each GPU with 16x PCI-e lanes, which is the highest of any motherboard as of 2018.

Memory

A Deep Learning computer should have at least as much RAM as GPU memory. For example, a machine with 8x NVIDIA RTX 2080 Ti GPUs should have at least 88 GB of memory (RTX 2080 Tis have 11 GB of memory each). The Premium has 8x 2080 Ti GPUs and 256 GB of memory, so it follows this rule of thumb. If you work with large data sets (e.g. many large images), consider upgrading to the Max, which has 512 GB of memory.

Storage

Most datasets do not fit in RAM. In such cases, during model training, subsets must be repeatedly swapped in and out of RAM from nonvolatile storage. Such a pipeline requires fast, solid state storage; without it, the GPUs would waste cycles waiting for their next batch of data. The Premium was designed with this constraint in mind; it has two nonvolatile storage devices: a 2 TB solid state drive (fast) for data you're training on now, and a 4 TB hard disk drive(slower) for everything else. Files located in the /data directory are stored on the HDD; all other files are stored on the SSD.

Network

The Premium has 10 Gbps ethernet. You're ISP will almost certainly be the bottleneck. The main benefit of 10 Gbps ethernet (as opposed to the standard 1 Gbps) is fast file transfers between the computers your network. Multi-node distributed training requires at least 40 Gbps (Infiniband territory).

Who bought a Premium?

About the [[ build.model.title ]] Max

GPUs

GPUs are the most critical piece of hardware for Deep Learning. The Lambda Blade Max has 4x NVIDIA RTX 2080 Ti GPUs (Turing Architecture). Each RTX 2080 Ti has 13.4 TFLOPs of FP32 performance (the standard precision for Deep Learning training). Our benchmarks show that the 2080 Ti is approximately 30% faster than the previous generation GTX 1080 Ti.

Processor

During training, the CPUs preprocess data and feed it to the GPUs. Slow processors will cause the GPUs to waste cycles waiting for this data. Core count and PCI-e lane count are important CPU performance factors. More cores means faster data preprocessing; more PCI-e lanes means faster transmission of that data to the GPUs. The Max has two Intel Xeon E5-2697 v4 (18 cores, 40x PCI-e lanes, each). Its core-to-GPU ratio is 4.5, which follows the best practice of at least 1 CPU core per GPU. The Max's CPUs, combined with its PLX-enabled motherboard, provide 16x PCI-e lanes to each GPU (the max possible).

Motherboard

A motherboard's PCI-e topology significantly impacts Deep Learning performance. PCI-e lanes are data pipes that enable communication amongst the GPUs and CPU. The number of PCI-e lanes attached to a given device can range from 1 to 16. A device with 16 PCI-e lanes can send data faster than a device with 4. When training a neural net, the GPUs and CPU send huge amounts of data to each other. To ensure speedy communication, the Max's motherboard provides each GPU with 16x PCI-e lanes, which is the highest of any motherboard as of 2018.

Memory

A Deep Learning computer should have at least as much RAM as GPU memory. For example, a machine with 8x NVIDIA RTX 2080 Ti GPUs should have at least 88 GB of memory (RTX 2080 Tis have 11 GB of memory each). The Max server has 8x RTX 2080 Ti GPUs and 512 GB of memory, so it follows this rule of thumb. If you work with large data sets (e.g. many large images), 512 GB of memory is standard.

Storage

Most datasets do not fit in RAM. In such cases, during model training, subsets must be repeatedly swapped in and out of RAM from nonvolatile storage. Such a pipeline requires fast, solid state storage; without it, the GPUs would waste cycles waiting for their next batch of data. The Max was designed with this constraint in mind; it has two nonvolatile storage devices: a 2 TB NVME (fast) for data you're training on now, and a 4 TB hard disk drive (slower) for everything else. Files located in the /data directory are stored on the HDD; all other files are stored on the SSD.

Network

The Max has 10 Gbps ethernet. You're ISP will almost certainly be the bottleneck. The main benefit of 10 Gbps ethernet (as opposed to the standard 1 Gbps) is fast file transfers between the computers your network. Multi-node distributed training requires at least 40 Gbps (Infiniband territory).

Who bought a Max?

[[ component.name ]]

[[ option.description ]] [[ build.getPriceDiff(component, option)]]