Titan V Deep Learning Benchmarks with TensorFlow in 2019

In this post, Lambda Labs benchmarks the Titan V's Deep Learning / Machine Learning performance and compares it to other commonly used GPUs. We use the Titan V to train ResNet-50, ResNet-152, Inception v3, Inception v4, VGG-16, AlexNet, and SSD300. We measure the # of images processed per second while training each network.

A few notes:

Titan V - FP32 TensorFlow Performance (1 GPU)

For FP32 training of neural networks, the NVIDIA Titan V is...

  • 42% faster than RTX 2080
  • 41% faster than GTX 1080 Ti
  • 26% faster than Titan XP
  • 4% faster than RTX 2080 Ti
  • 90% as fast as Titan RTX
  • 75% as fast as Tesla V100 (32 GB)

as measured by the # images processed per second during training.

Titan V - FP16 TensorFlow Performance (1 GPU)

For FP16 training of neural networks, the NVIDIA Titan V is..

  • 111% faster than GTX 1080 Ti
  • 94% faster than Titan XP
  • 70% faster than RTX 2080
  • 23% faster than RTX 2080 Ti
  • 87% as fast as Titan RTX
  • 68% as fast as Tesla V100 (32 GB)

as measured by the # images processed per second during training.

FP32 Multi-GPU Scaling Performance (1, 2, 4, 8 GPUs)

For each GPU type (Titan V, RTX 2080 Ti, RTX 2080, etc.) we measured performance while training with 1, 2, 4, and 8 GPUs on each neural networks and then averaged the results. The chart below provides guidance as to how each GPU scales during multi-GPU training of neural networks in FP32. The chart can be read as follows:

  • Using eight Titan Vs will be 5.18x faster than using a single Titan V
  • Using eight Tesla V100s will be 9.68x faster than using a single Titan V
  • Using eight Tesla V100s is 9.68 / 5.18 = 1.87x faster than using eight Titan Vs

Titan V - FP16 vs. FP32

FP16 can reduce training times and enable larger batch sizes/models without significantly impacting model accuracy. Compared with FP32, FP16 training on the Titan V is...

  • 80% faster on ResNet-50
  • 69% faster on ResNet-152
  • 70% faster on Inception v3
  • 51% faster on Inception v4
  • 96% faster on VGG-16
  • 78% faster on AlexNet
  • 57% faster on SSD300

as measured by the # of images processed per second during training. This gives an average speed-up of +71.6%.

Caveat emptor: If you're new to machine learning or simply testing code, we recommend using FP32. Lowering precision to FP16 may interfere with convergence.

GPU Prices

  • Titan V: $2,999.00
  • RTX 2080 Ti: $1,199.00
  • RTX 2080: $799.00
  • Titan RTX: $2,499.00
  • Tesla V100 (32 GB): ~$8,200.00
  • GTX 1080 Ti: $699.00
  • Titan Xp: $1,200.00

Methods

  • For each model we ran 10 training experiments and measured # of images processed per second; we then averaged the results of the 10 experiments.
  • For each GPU / neural network combination, we used the largest batch size that fit into memory. For example, on ResNet-50, the V100 used a batch size of 192; the RTX 2080 Ti use a batch size of 64.
  • We used synthetic data, as opposed to real data, to minimize non-GPU related bottlenecks
  • Multi-GPU training was performed using model-level parallelism

Hardware

Software

  • Ubuntu 18.04 (Bionic)
  • TensorFlow 1.12
  • CUDA 10.0.130
  • cuDNN 7.4.1

Run Our Benchmarks On Your Own Machine

Our benchmarking code is on github. We'd love it if you shared the results with us by emailing s@lambdalabs.com or tweeting @LambdaAPI.

Step #1: Clone Benchmark Repository

git clone https://github.com/lambdal/lambda-tensorflow-benchmark.git --recursive

Step #2: Run Benchmark

  • Input a proper gpu_index (default 0) and num_iterations (default 10)
cd lambda-tensorflow-benchmark
./benchmark.sh gpu_index num_iterations

Step #3: Report Results

  • Check the repo directory for folder <cpu>-<gpu>.logs (generated by benchmark.sh)
  • Use the same num_iterations in benchmarking and reporting.
./report.sh <cpu>-<gpu>.logs num_iterations

Raw Benchmark Data

FP32: # Images Processed Per Sec During TensorFlow Training (1 GPU)

Model / GPU RTX 2080 Ti RTX 2080 Titan RTX Titan V V100 Titan Xp 1080 Ti
ResNet-50 294 213 330 300 405 236 209
ResNet-152 110 83 129 107 155 90 81
Inception v3 194 142 221 208 259 151 136
Inception v4 79 56 96 77 112 63 58
VGG16 170 122 195 195 240 154 134
AlexNet 3627 2650 4046 3796 4782 3004 2762
SSD300 149 111 169 156 200 123 108

FP16: # Images Processed Per Sec During TensorFlow Training (1 GPU)

Model / GPU RTX 2080 Ti RTX 2080 Titan RTX Titan V V100 Titan Xp 1080 Ti
ResNet-50 466 329 612 539 811 289 263
ResNet-152 167 124 234 181 305 104 96
Inception v3 286 203 381 353 494 169 156
Inception v4 106 74 154 116 193 67 62
VGG16 255 178 383 383 511 166 149
AlexNet 4988 3458 6627 6746 8922 3104 2891
SSD300 195 153 292 245 350 136 123
!-- Intercom -->