The Lambda Deep Learning Blog

Crowd Sourced Deep Learning GPU Benchmarks from the Community

Written by Stephen Balaban | Oct 12, 2018 4:00:00 AM

We open sourced the benchmarking code we use at Lambda Labs so that anybody can reproduce the benchmarks that we publish or run their own. We encourage people to email us with their results and will continue to publish those results here. You can run the code and email benchmarks@lambdalabs.com or tweet @LambdaAPI. This is the official page for all Lambda Community Benchmarks.

How to get your results published here

Component        | Version
-----------------|------------
CPU              | $(cat /proc/cpuinfo | grep 'model name' | uniq | awk -F: '{ print $2 }')
Distro	         | $(lsb_release -d)
Kernel Version   | $(uname -r)
Kernel Arch      | $(uname -m)
GPU              | $(sudo lspci | grep VGA\ compat | head -n1)
Tensorflow       | $(python -c 'import tensorflow;print(tensorflow.__version__)' 2> /dev/null)
NVIDIA Driver    | $(head -n1 /proc/driver/nvidia/version | awk '{ print $8 }')
CUDA	         | $(nvcc --version | tail -n 1 | grep Cuda | awk '{ print $6 }')
cuDNN	         | $(cat /usr/include/cudnn.h | grep -P 'define\ CUDNN_MAJOR|define\ CUDNN_MINOR|define\ CUDNN_PATCHLEVEL' | awk '{ print $3 }' | sed ':a;N;$!ba;s/\n/./g')
Python	         | $(python --version 2>&1)

Copy the above and paste into template.txt. Then run the code below to output your table.

IFS=' '
cat > template.txt
CTRL-V (paste in)
CTRL-D (end file)
(for line in $(cat template.txt); do eval "echo \"$line\""; done) > specs-table.txt

Crowd Sourced Results

Here are the results that have been submitted to us by third parties.

Summary (Stanislav Brizitsky)

Component Version
CPU AMD Phenom(tm) II X6 1075T Processor
Distro Ubuntu 19.04
Kernel Version 5.0.0-31-generic
Kernel Arch x86_64
GPU 04:00.0 VGA compatible controller: NVIDIA Corporation TU102 GeForce RTX 2080 Ti (rev a1)
Tensorflow 1.14.0
NVIDIA Driver 430.50
CUDA V10.1.105
cuDNN 7.6.4
Python Python 3.7.3

Summary

model input size param mem feat. mem flops
resnet-50 224 x 224 98 MB 103 MB 4 BFLOPs
resnet-152 224 x 224 230 MB 219 MB 11 BFLOPs
inception-v3 299 x 299 91 MB 89 MB 6 BFLOPs
vgg-vd-19 224 x 224 548 MB 63 MB 20 BFLOPs
alexnet 227 x 227 233 MB 3 MB 1.5 BFLOPs
ssd-300 300 x 300 100 MB 116 MB 31 GFLOPS

syn-replicated-fp32-1gpus

Config X6-GeForce_RTX_2080_Ti
resnet50 294.45
resnet152 107.72
inception3 193.16
inception4 76.11
vgg16 176.88
alexnet 3665.52
ssd300 150.39

syn-parameter_server-fp32-1gpus

Config X6-GeForce_RTX_2080_Ti
resnet50 291.69
resnet152 107.40
inception3 192.78
inception4 76.10
vgg16 176.89
alexnet 3673.53
ssd300 150.30

syn-replicated-fp16-1gpus

Config X6-GeForce_RTX_2080_Ti
resnet50 462.98
resnet152 172.64
inception3 284.54
inception4 104.68
vgg16 261.92
alexnet 4755.02
ssd300 194.10

syn-parameter_server-fp16-1gpus

Config X6-GeForce_RTX_2080_Ti
resnet50 468.89
resnet152 176.30
inception3 287.66
inception4 108.32
vgg16 266.71
alexnet 4856.88
ssd300 197.04
Attachments area  

Summary (Antonio Marin)

RTX2080Ti benchmark

Specifications

Component Version
CPU Intel(R) Core(TM) i9-7900X CPU @ 3.30GHz
Distro Description: Ubuntu 16.04.6 LTS
Kernel Version 4.15.0-52-generic
Kernel Arch x86_64
GPU 65:00.0 VGA compatible controller: NVIDIA Corporation GV102 (rev a1)
Tensorflow 1.12.0
NVIDIA Driver 418.56
CUDA V7.5.17
cuDNN 7.3.1
Python Python 3.6.8 :: Anaconda, Inc.

Benchmark results

model input size param mem feat. mem flops
resnet-50 224 x 224 98 MB 103 MB 4 BFLOPs
resnet-152 224 x 224 230 MB 219 MB 11 BFLOPs
inception-v3 299 x 299 91 MB 89 MB 6 BFLOPs
vgg-vd-19 224 x 224 548 MB 63 MB 20 BFLOPs
alexnet 227 x 227 233 MB 3 MB 1.5 BFLOPs
ssd-300 300 x 300 100 MB 116 MB 31 GFLOPS

syn-replicated-fp32-1gpus

Config i9-7900X-GeForce_RTX_2080_Ti
resnet50 318.45
resnet152 121.54
inception3 210.28
inception4 88.72
vgg16 186.87
alexnet 3877.75
ssd300 162.28

syn-parameter_server-fp32-1gpus

Config i9-7900X-GeForce_RTX_2080_Ti
resnet50 316.52
resnet152 122.22
inception3 211.87
inception4 88.26
vgg16 186.70
alexnet 3868.16
ssd300 162.23

syn-replicated-fp16-1gpus

Config i9-7900X-GeForce_RTX_2080_Ti
resnet50 448.98
resnet152 159.09
inception3 261.64
inception4 96.25
vgg16 215.97
alexnet 4507.86
ssd300 186.27

syn-parameter_server-fp16-1gpus

Config i9-7900X-GeForce_RTX_2080_Ti
resnet50 454.84
resnet152 162.12
inception3 259.83
inception4 98.24
vgg16 220.16
alexnet 4566.05
ssd300 187.44

Summary - Mike Metral - 1080 Ti

model input size param mem feat. mem flops
resnet-50 224 x 224 98 MB 103 MB 4 BFLOPs
resnet-152 224 x 224 230 MB 219 MB 11 BFLOPs
inception-v3 299 x 299 91 MB 89 MB 6 BFLOPs
vgg-vd-19 224 x 224 548 MB 63 MB 20 BFLOPs
alexnet 227 x 227 233 MB 3 MB 1.5 BFLOPs
ssd-300 300 x 300 100 MB 116 MB 31 GFLOPS

syn-replicated-fp32-1gpus

Config v2-GeForce_GTX_1080_Ti
resnet50 221.33
resnet152 84.99
inception3 142.51
inception4 60.11
vgg16 142.39
alexnet 2868.88
ssd300 112.22

syn-parameter_server-fp32-1gpus

Config v2-GeForce_GTX_1080_Ti
resnet50 221.24
resnet152 85.04
inception3 142.39
inception4 60.12
vgg16 142.17
alexnet 2870.47
ssd300 112.14

syn-replicated-fp16-1gpus

Config v2-GeForce_GTX_1080_Ti
resnet50 275.24
resnet152 99.76
inception3 161.39
inception4 64.63
vgg16 153.03
alexnet 2981.33
ssd300 126.42

syn-parameter_server-fp16-1gpus

Config v2-GeForce_GTX_1080_Ti
resnet50 275.78
resnet152 100.20
inception3 160.48
inception4 65.22
vgg16 156.34
alexnet 3022.28
ssd300 127.33

Hardware / Software

Component Version
Distro Ubuntu 18.04.1
Kernel 4.18.5 x86_64
GPU / Compute Capacity NVIDIA GeForce GTX 1080 TI - 6.1
Tensorflow v1.11.0
NVIDIA 410.57
CUDA 10.0.130_410.48
cuDNN 7.3.0.29
NCCL 2.3.5
GCC Ubuntu 6.4.0-17ubuntu1
Python 3.6.6
Bazel 0.16.1