GCP GPU VM with Nvidia Docker

由於GPU使用上需要先安裝driver,下面是整理Nvidia Docker在GCP上的安裝,可以透過startup script來做開機時安裝,準備的startup script如下:

File: startup.sh

#!/bin/bash
echo "Checking for CUDA and installing."
# Check for CUDA and try to install.
if ! dpkg-query -W cuda-8-0; then
  # The 16.04 installer works with 16.10.
  curl -O http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
  dpkg -i ./cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
echo Install nvidia-docker2
  apt-get update
  apt-get install cuda-8-0 -y
fi
# Enable persistence mode
nvidia-smi -pm 1

# Install docker
echo Install docker-ce 17.12.0
sudo apt-get install \
    apt-transport-https \
    ca-certificates \
    curl \
    software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"
apt update -y && apt install docker-ce=17.12.0~ce-0~ubuntu -y

# Install nvidia-docker2
echo Install nvidia-docker2
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \
          sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64/nvidia-docker.list | \
          sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
apt-get install -y nvidia-docker2
sudo pkill -SIGHUP dockerd
echo Install done....

接著,就可以透過下面指令來開啟這台GPU的機器...

gcloud beta compute \
  --project "simon-lab" instances create "gpu-test" \
  --zone "asia-east1-a" \
  --machine-type "n1-standard-1" \
  --subnet "default" \
  --metadata-from-file startup-script=startup.sh \
  --maintenance-policy "TERMINATE" \
  --service-account "[email protected]" \
  --scopes "https://www.googleapis.com/auth/cloud-platform" \
  --accelerator type=nvidia-tesla-k80,count=1 \
  --min-cpu-platform "Automatic" \
  --tags "http-server","https-server" \
  --image "ubuntu-1604-xenial-v20180126" \
  --image-project "ubuntu-os-cloud" \
  --boot-disk-size "100" \
  --boot-disk-type "pd-standard" \
  --boot-disk-device-name "gpu-test"

等機器開啟後,原則上機器會持續進行安裝的程序,我們可以登入主機中,透過下面指令檢視目前安裝的動作...

tail -f /var/log/syslog

startup script的程序會在每一行log的開頭寫上startup script的提示,只要確認看到Install done的話,就代表安裝完成..

測試

在本機中,由於已經安裝好nvidia相關的driver,因此可以使用nvidia-smi來查看目前的GPU driver的狀態...

另外,我們也可以透過nvidia runtime的docker來執行nvidia-smi來確認nvidia-docker是否正確安裝...

root@gpu-test:~# docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
Sun Jan 28 13:34:00 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.12                 Driver Version: 390.12                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           On   | 00000000:00:04.0 Off |                    0 |
| N/A   29C    P8    25W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

results matching ""

    No results matching ""