GCP GPU VM with Nvidia Docker
由於GPU使用上需要先安裝driver,下面是整理Nvidia Docker在GCP上的安裝,可以透過startup script來做開機時安裝,準備的startup script如下:
File: startup.sh
#!/bin/bash
echo "Checking for CUDA and installing."
# Check for CUDA and try to install.
if ! dpkg-query -W cuda-8-0; then
# The 16.04 installer works with 16.10.
curl -O http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
dpkg -i ./cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
echo Install nvidia-docker2
apt-get update
apt-get install cuda-8-0 -y
fi
# Enable persistence mode
nvidia-smi -pm 1
# Install docker
echo Install docker-ce 17.12.0
sudo apt-get install \
apt-transport-https \
ca-certificates \
curl \
software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) \
stable"
apt update -y && apt install docker-ce=17.12.0~ce-0~ubuntu -y
# Install nvidia-docker2
echo Install nvidia-docker2
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \
sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64/nvidia-docker.list | \
sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
apt-get install -y nvidia-docker2
sudo pkill -SIGHUP dockerd
echo Install done....
接著,就可以透過下面指令來開啟這台GPU的機器...
gcloud beta compute \
--project "simon-lab" instances create "gpu-test" \
--zone "asia-east1-a" \
--machine-type "n1-standard-1" \
--subnet "default" \
--metadata-from-file startup-script=startup.sh \
--maintenance-policy "TERMINATE" \
--service-account "[email protected]" \
--scopes "https://www.googleapis.com/auth/cloud-platform" \
--accelerator type=nvidia-tesla-k80,count=1 \
--min-cpu-platform "Automatic" \
--tags "http-server","https-server" \
--image "ubuntu-1604-xenial-v20180126" \
--image-project "ubuntu-os-cloud" \
--boot-disk-size "100" \
--boot-disk-type "pd-standard" \
--boot-disk-device-name "gpu-test"
等機器開啟後,原則上機器會持續進行安裝的程序,我們可以登入主機中,透過下面指令檢視目前安裝的動作...
tail -f /var/log/syslog
startup script的程序會在每一行log的開頭寫上startup script的提示,只要確認看到Install done的話,就代表安裝完成..
測試
在本機中,由於已經安裝好nvidia相關的driver,因此可以使用nvidia-smi來查看目前的GPU driver的狀態...
另外,我們也可以透過nvidia runtime的docker來執行nvidia-smi來確認nvidia-docker是否正確安裝...
root@gpu-test:~# docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
Sun Jan 28 13:34:00 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.12 Driver Version: 390.12 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 On | 00000000:00:04.0 Off | 0 |
| N/A 29C P8 25W / 149W | 0MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+