RTX 2080 Ti Deep Learning Benchmarks

To see a side-by-side comparison of the RTX 2080 Ti vs Titan V vs V100, see our latest blog post.

The new Turing architecture RTX 2080 Ti GPU

In this post, we benchmark the NVIDIA RTX 2080 Ti on several Deep Learning training tasks using TensorFlow. We compare performance of the Turing 2080 Ti to the Pascal 1080 Ti, the incumbent for best deep learning GPU in 2018.

TL;DR

  • The RTX 2080 Ti’s single-precision (FP32) training of CNNs with TensorFlow is between 27% and 45% faster than the 1080 Ti for measured networks.
  • The RTX 2080 Ti’s half-precision (FP16) training of CNNs with TensorFlow is between 60% and 65% faster than the 1080 Ti for measured networks.
  • If you do FP16 training, the RTX 2080 Ti is probably worth the extra money. If you don't, then you'll need to consider whether a 71% increase in cost is worth an average of 36% increase in performance.
The 2080 Ti offers a 1.41x speedup over the 1080 Ti for single precision ResNet-152 training.
The 2080 Ti offers a 1.65x speedup over the 1080 Ti for half precision ResNet-152 training.
Performance comparison for the 2080 Ti across a variety of models.

Raw Benchmark Data

Single-precision performance of 2080 Ti and 1080 Ti

We benchmarked the 2080 Ti and 1080 Ti on single-precision (FP32) training of commonly used TensorFlow models; we measure images processed per second (images / second) during training. The benchmarking we wrote to obtain these results can be found here. See our methods section for more info.

2080 Ti Speedup over 1080 Ti for FP32 Training

Model / GPU 2080 Ti 1080 Ti
ResNet-152 1.33 1.00
ResNet-50 1.40 1.00
InceptionV3 1.45 1.00
InceptionV4 1.42 1.00
VGG16 1.27 1.00
AlexNet 1.30 1.00
SSD300 1.38 1.00

Raw FP32 training speeds (images / second)

Model / GPU 2080 Ti 1080 Ti
ResNet-50 286 203
ResNet-152 110 82
InceptionV3 189 130
InceptionV4 81 56
VGG16 169 133
AlexNet 3550 2720
SSD300 148 107

Half-precision performance of 2080 Ti and 1080 Ti

Half-precision arithmetic is sufficient for training many networks. In this section, we benchmark the 2080 Ti and 1080 Ti on half-precision (FP16) training of VGG16 and ResNet-152, we measure images processed per second (images / second) during training. We use Yusaku Sako benchmark scripts.

2080 Ti Speedup for FP16/FP32 Training

Model 2080 Ti FP32 1080 Ti FP32 2080 Ti FP16 1080 Ti FP16
ResNet-152 1.41 1.00 1.65 1.00
VGG16 1.25 1.00 1.60 1.00

Raw FP16/FP32 training speeds (images / second)

Model 2080 Ti FP32 1080 Ti FP32 2080 Ti FP16 1080 Ti FP16
ResNet-152 75.18 53.45 103.29 62.74
VGG16 163.26 130.80 238.45 149.39

2080 Ti vs 1080 ti - Speedup / $ - Cost Efficiency

Because both cards have 11 GB of memory, we're also going to look at performance per dollar. In this case, our metric is (images / sec / $). Here's how the two cards stack up. For both FP32 and FP16, the 1080 Ti still wins on a per dollar basis.

However, the efficiency gains for FP16 ResNet-152 in the Yusaku Sako benchmarks (recommended by Tim Dettmers) are only 4% for the 1080 Ti. For FP32 the efficiency gains are 21% for ResNet-152 and 37% for VGG16. We priced both the 1080 Ti and 2080 Ti at their launch prices of $700 and $1,200 respectively.

So, if you do FP32 training, the 1080 Ti may still be a better choice for you, especially if you are cash conscious. That said, most of you reading this will probably want to get the 2080 Ti.

Raw Cost Efficiency Data

FP16 Speedup / $
Model / GPU 2080 Ti 1080 Ti 1080 Ti Cost Efficiency Gain
ResNet-152 0.00137 0.00143 1.04
VGG16 0.00133 0.00143 1.07
FP32 Speedup / $
Model / GPU 2080 Ti 1080 Ti 1080 Ti Cost Efficiency Gain
ResNet-152 0.00117 0.00143 1.21
VGG16 0.00104 0.00143 1.37

Methods

  • For each model we ran 10 training experiments and measured images processed per second; we then took the average of these 10 experiments.
  • The speedup benchmark is calculated by taking the images / sec score and dividing it by the minimum image / sec score for that particular model. This essentially shows you the percentage improvement over the baseline (in this case the 1080 Ti).
  • The 2080 Ti does have tensor cores which are used in this benchmark.

Hardware

  • Lambda Quad Basic
  • RAM: 64 GB DDR4 2400 MHz
  • Processor: Intel Xeon E5-1650 v4
  • Motherboard: ASUS X99-E WS/USB 3.1
  • GPU: EVGA XC 2080 Ti GPU TU102 and ASUS 1080 Ti Turbo GP102

Software

  • Ubuntu 18.04 (Bionic)
  • TensorFlow 1.11.0-rc1
  • CUDA 10.0.130
  • CuDNN 7.3

Reproduce these benchmarks yourself

We've published our benchmarking code on github. Please feel free to reproduce these results on your own machine and with different GPUs. We'd love it if you shared those results with us by emailing s@lambdalabs.com or tweeting @LambdaAPI.

Step One: Clone benchmark repo

git clone https://github.com/lambdal/lambda-tensorflow-benchmark.git --recursive

Step Two: Run benchmark

  • Input a proper gpu_index (default 0) and num_iterations (default 10)
cd lambda-tensorflow-benchmark
./benchmark.sh gpu_index num_iterations

Step Three: Report results

  • Check the repo directory for folder <cpu>-<gpu>.logs (generated by benchmark.sh)
  • Use the same num_iterations in benchmarking and reporting.
./report.sh <cpu>-<gpu>.logs num_iterations

Batch Sizes Used

Model Batch Size
ResNet-50 64
ResNet-152 32
InceptionV3 64
InceptionV4 16
VGG16 64
AlexNet 512
SSD 32

Future Work

  • Benchmark the 2080 Ti with multiple GPUs, with and without the NVLINK connector.
  • Benchmark the new 2080.

We are now taking pre-orders for the Lambda Quad 2080 Ti workstation.

!-- Intercom -->