Set up a TensorFlow GPU Docker container using the Lambda Stack Dockerfile

Or, how Lambda Stack Dockerfiles + docker-ce + nvidia-docker = GPU accelerated deep learning containers

Accelerated Docker Containers with GPUs!

Ever wonder how to build a GPU docker container with TensorFlow in it? In this tutorial, we'll walk you through every step, including installing Docker and building a Docker image with Lambda Stack pre-installed. This will provide a GPU-accelerated version of TensorFlow, PyTorch, Caffe 2, and Keras within a portable Docker container. This process results in almost the exact same thing as the NVIDIA GPU Cloud (NGC) container registry, but without the proprietary silliness.

For the docker installation step we assume Ubuntu 18.04, however, after installation this should also work with CentOS, Ubuntu 16.04.

Step 0: Ensure you have the NVIDIA drivers installed on your host system - if not, use Lambda Stack to get them

Some prerequisites need to be met first, you'll need to already have the NVIDIA drivers for your GPU installed on the host machine. If you already have this, skip to Step 1. If not, you can use Lambda Stack to install them: https://lambdalabs.com/lambda-stack-deep-learning-software

For brevity, you can copy and paste this code to install Lambda Stack:

LAMBDA_REPO=$(mktemp) && \
wget -O${LAMBDA_REPO} https://lambdalabs.com/static/misc/lambda-stack-repo.deb && \
sudo dpkg -i ${LAMBDA_REPO} && rm -f ${LAMBDA_REPO} && \
sudo apt-get update && sudo apt-get install -y lambda-stack-cuda
sudo reboot

Step 1: Install Docker with GPU Support (docker-ce + nvidia-docker)

Now that we have the GPU drivers installed, we're going to install Docker CE.

# First, remove old versions of Docker
sudo apt-get remove -y docker docker-engine docker.io

# Next, add Docker Repository
sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl software-properties-common
curl -fsSL 'https://download.docker.com/linux/ubuntu/gpg' | sudo apt-key add -
sudo add-apt-repository \
    "deb [arch=amd64] https://download.docker.com/linux/ubuntu  $(lsb_release -cs) stable"

# Install Docker CE
sudo apt-get update
sudo apt-get install -y docker-ce

# Verify the Docker installation
sudo docker run hello-world	

Next, we're going to install nvidia-docker version 2.

# First, purge all of the old docker versions and containers.
sudo docker volume ls -q -f driver=nvidia-docker \
    | xargs -r -I{} -n1 docker ps -q -a -f volume={} \
    | xargs -r docker rm -f
sudo apt-get purge -y nvidia-docker

# Add NVIDIA's docker repository to your system.
# Install nvidia-docker2 and restart the Docker daemon.
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
    && curl -s -L "https://nvidia.github.io/nvidia-docker/$(. /etc/os-release; echo $ID$VERSION_ID)/nvidia-docker.list" \
    | sudo tee /etc/apt/sources.list.d/nvidia-docker.list \
    && sudo apt-get update \
    && sudo apt-get install -y nvidia-docker2 \
    && sudo pkill -SIGKILL dockerd

# Test nvidia-smi within the Docker container.
sudo docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi

If you'd like that as a simple, illegible, copy-pastable code block, here you go:

# Install Docker CE + nvidia-docker version 2.0
sudo apt-get remove -y docker docker-engine docker.io \
    && sudo apt-get update \
    && sudo apt-get install -y apt-transport-https ca-certificates curl software-properties-common \
    &&  curl -fsSL 'https://download.docker.com/linux/ubuntu/gpg' | sudo apt-key add - \
    && sudo add-apt-repository \
    "deb [arch=amd64] https://download.docker.com/linux/ubuntu  $(lsb_release -cs) stable" \
    && sudo apt-get update \
    && sudo apt-get install -y docker-ce \
    && sudo docker run hello-world \
    && sudo docker volume ls -q -f driver=nvidia-docker \
    | xargs -r -I{} -n1 docker ps -q -a -f volume={} \
    | xargs -r docker rm -f \
    && sudo apt-get purge -y nvidia-docker || true \
    && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
    && curl -s -L "https://nvidia.github.io/nvidia-docker/$(. /etc/os-release; echo $ID$VERSION_ID)/nvidia-docker.list" \
    | sudo tee /etc/apt/sources.list.d/nvidia-docker.list \
    && sudo apt-get update \
    && sudo apt-get install -y nvidia-docker2 \
    && sudo pkill -SIGKILL dockerd \
    && sudo docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi \
    && echo Docker CE and nvidia-docker successfully installed.

Step 2: Build the Lambda Stack Docker Image

Great, now what can you actually do with this? Nothing so far. With nvidia/cuda:9.0-base You can't even run python!

sudo docker run --interactive --tty --runtime=nvidia --rm nvidia/cuda:9.0-base /bin/bash
root@75c956880e7:/# python
bash: python: command not found

Let's fix that by switching to Lambda Stack's Dockerfiles. Building these docker images requires acceptance of the cuDNN license agreement.

# Build a Docker image for Ubuntu 18.04 (bionic) or 16.04 (xenial) from our repository.
sudo docker build -t lambda-stack -f Dockerfile.$(lsb_release -cs) git://github.com/lambdal/lambda-stack-dockerfiles.git

This will output a lot of text and will take around 9 minutes depending on your internet connection speed. Once that's finished you can verify your build was successful with

sudo docker image list

You should see a docker image with the repository "lambda-stack" and tag "latest". Let's run a command with that docker image.

sudo docker run --rm --interactive --tty --runtime=nvidia lambda-stack:latest /bin/bash
root@75c956880e7:/# python
>>> import keras
Using TensorFlow backend.
>>> import torch
>>> import caffe
>>> import tensorflow as tf
>>> s = tf.Session()
...

We're now officially using Lambda Stack in a GPU-enabled Docker container. You have access to TensorFlow, PyTorch, Caffe, Keras, and more. That's what I call a smart move.

Now we can run a few jobs without the TTY/interactivity.

# Run a 10,000 x 10,000 matmul job on the GPU via Docker and PyTorch
sudo docker run --rm --runtime=nvidia lambda-stack:latest /usr/bin/python3 -c 'import torch; sz=10000; torch.mm(torch.randn(sz, sz).cuda(), torch.randn(sz, sz).cuda())'

Step 3: Upload your Lambda Stack Docker image to a container registry

Now that you've built your container, let's make it available to your colleagues and ready to deploy by uploading it to a container registry. First, sign up for Docker Hub: https://hub.docker.com/. Then, log in, tag lambda-stack:latest with your username like so username/lambda-stack:latest and then push it. (Note: with the current size of Lambda Stack's Docker image weighing in at around 9GB, I wouldn't recommend using a Docker container registry that is outside of your local network. A lightweight version of Lambda Stack Dockerfiles will be coming soon.)

sudo docker login
sudo docker tag lambda-stack myusername/lambda-stack:latest
sudo docker push myusername/lambda-stack:latest

Now you can, from any other machine that has docker-ce and nvidia-docker installed, run your Lambda Stack container without having to re-build the image:

sudo docker run --rm --interactive --tty myusername/lambda-stack:latest \
    /usr/bin/python3 -c \
    'import tensorflow as tf; s = tf.Session(); print("Wow, Lambda Stack Dockerfiles are great. I love Lambda!");'

VoilĂ . You're now up and running with a Lambda Stack Docker image. Furthermore, you now have a Docker image hosted on your container registry that you control.

If you have any questions about using Lambda Stack Dockerfiles, email software@lambdalabs.com.

!-- Intercom -->