Machine Learning Performance: Titan RTX vs. 2080 Ti vs. 1080 Ti vs. Titan Xp vs. Titan V vs. Tesla V100.
At Lambda Labs, we're getting a lot of inquiries about the Deep Learning performance of NVIDIA's new Titan RTX GPU. In this post, we benchmark the speed of the Titan RTX on various AI training tasks. This post includes AI benchmarks of the following GPUs:
- Titan RTX
- RTX 2080 Ti
- Tesla V100 (32 GB)
- GTX 1080 Ti
- Titan Xp
- Titan V
All benchmarks were performed on the Lambda Dual, a workstation with 2x Titan RTX GPUs. Only single-GPU training speeds on popular neural nets is covered. A follow-up will discuss multi-GPU performance using the Lambda Blade, our GPU server with 8x Titan RTX GPUs.
Summary of Results
For each GPU, we measured the # of images processed per second while training the following neural networks: ResNet50, ResNet152, Inception3, Inception4, VGG16, AlexNet, and SSD.
For FP32 training, the Titan RTX is (on average)...
- 8% faster than the RTX 2080 Ti
- 46.8% faster than the GTX 1080 Ti
- 31.4% faster than the Titan Xp
- 4% faster than the Titan V
- 13.7% slower that the Tesla V100 (32 GB)
For FP16 training, the Titan RTX is (on average)...
- 21.4% faster than the RTX 2080 Ti
- 209.7% faster than the GTX 1080 Ti
- 192.1% faster than the Titan Xp
- 1.6% slower than the Titan V
- Stay tuned for comparison to the V100 (32 GB)
Conclusions - What's the best GPU for Machine Learning / Deep Learning in 2019?
- RTX 2080 Ti is the best GPU for Machine Learning / Deep Learning if... 11 GB of GPU memory is sufficient for your training needs (for many people, it is). The 2080 Ti offers the best price/performance among the Titan RTX, Tesla V100, Titan V, GTX 1080 Ti, and Titan Xp.
- Titan RTX is the best GPU for Machine Learning / Deep Learning if... 11 GB of memory isn't sufficient for your training needs. However, before concluding this, try training at half-precision (16-bit). This effectively doubles your GPU memory at the cost of training accuracy. If you're already successfully training at FP16 and 11 GB still isn't enough, then choose the Titan RTX -- otherwise, go with the RTX 2080 Ti. At half-precision, the Titan RTX offers effectively 48 GB of GPU memory.
- Tesla V100 is the best GPU for Machine Learning / Deep Learning if... price isn't important, you need every bit of GPU memory available, or time to market of your product is of utmost important.
- All models were trained on a synthetic dataset to isolate GPU performance from CPU pre-processing performance and reduce spurious I/O bottlenecks.
- For each GPU/model pair, 10 training experiments were conducted and then averaged.
- The "Normalized Training Performance" of a GPU is calculated by dividing its images / sec performance on a specific model by the images / sec performance of the 1080 Ti on that same model.
- The Titan RTX, 2080 Ti, Titan V, and V100 benchmarks utilized Tensor Cores.
Lambda Dual - 2x Titan RTX Desktop Computer with Intel Core i9-7920X + 64 GB of RAM. We simply swapped GPUs.
- Ubuntu 18.04
- TensorFlow: v1.11.0
- CUDA: 10.0.130
- cuDNN: 7.4.1
- NVIDIA Driver: 415.25
The tables below display the raw performance of each GPU while training in FP32 mode (single precision) and FP16 mode (half-precision), respectively. Note that the unit measured is # of images processed per second and we rounded results to the nearest integer.
FP32 - Number of images processed per second
|Model / GPU||Titan RTX||1080 Ti||Titan Xp||Titan V||2080 Ti||V100|
FP16 - Number of images processed per second
|Model / GPU||Titan RTX||1080 Ti||Titan Xp||Titan V||2080 Ti|
Reproduce the benchmarks yourself
All benchmarking code is available on Lambda Lab's GitHub repo. Share your results by emailing firstname.lastname@example.org or tweeting @LambdaAPI. Be sure to include the hardware specifications of the machine you used.
Step One: Clone benchmark repo
git clone https://github.com/lambdal/lambda-tensorflow-benchmark.git --recursive
Step Two: Run benchmark
- Input a proper gpu_index (default 0) and num_iterations (default 10)
cd lambda-tensorflow-benchmark ./benchmark.sh gpu_index num_iterations
Step Three: Report results
- Check the repo directory for folder <cpu>-<gpu>.logs (generated by benchmark.sh)
- Use the same num_iterations in benchmarking and reporting.
./report.sh <cpu>-<gpu>.logs num_iterations