k8s-1.6-gpu

k8s 1.6中GPU调度验证

1. 目的

调研kubernetes 1.6版本中对GPU调度的支持,是否符合项目需求。

2. 环境

物理环境:NVIDIA Tesla P4 *4
nvdia驱动:NVIDIA-Linux-x86_64-375.51
docker: 17.03.1-ce
NVIDIA-docker: nvidia-docker-1.0.1-1.x86_64
kubernetes: v1.6.4

3. 准备

  • 安装NVIDIA驱动

这里需要注意依赖的版本不对,安装驱动时候会有问题。这里采用系统镜像IOS包中的rpm包。镜像挂载在/mnt目录。

新建本地源:

1
2
3
4
5
6
# vi /etc/yum.repos.d/local.repo
[local_server]
name=This is local repo
baseurl=file:///mnt/
enabeld=1
gpgcheck=0

安装依赖

1
2
yum install -y gcc kernel-devel
rpm -ivh /mnt/Packages/kernel-devel-3.10.0-514.el7.x86_64.rpm

安装驱动

1
./NVIDIA-Linux-x86_64-375.51.run
  • 准备库文件

在后面的GPU容器中需要挂载NVIDIA的驱动库。目前驱动文件分布比较零散。可以使用nvidia-docker帮我们收集库文件。

安装NVIDIA-Docker

1
rpm -ivh nvidia-docker-1.0.1-1.x86_64.rpm

安装完后,需要启动nvidia-docker守护进程,并创建一个GPU容器。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
$ systemctl start nvidia-docker
$ nvidia-docker run --rm nvidia/cuda nvidia-smi
Fri Jun 23 11:19:53 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.51 Driver Version: 375.51 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P4 Off | 0000:02:00.0 Off | 0 |
| N/A 39C P8 7W / 75W | 0MiB / 7606MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P4 Off | 0000:03:00.0 Off | 0 |
| N/A 41C P8 7W / 75W | 0MiB / 7606MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

可以看到生成了目录。这也是后面Pod需要挂载的目录。

1
2
$ ls /var/lib/nvidia-docker/volumes/nvidia_driver/375.51
bin lib lib64

这时候nvidia-docker就可以功成身退了。也可以卸载掉。

1
systemctl stop nvidia-docker

这样我们就可以使用原始的docker启动GPU容器了:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
$ docker run --rm \
-v /var/lib/nvidia-docker/volumes/nvidia_driver/375.51:/usr/local/nvidia \
--device /dev/nvidiactl:/dev/nvidiactl \
--device /dev/nvidia-uvm:/dev/nvidia-uvm \
--device /dev/nvidia-uvm-tools:/dev/nvidia-uvm-tools \
--device /dev/nvidia0:/dev/nvidia0 \
nvidia/cuda nvidia-smi
Sat Aug 12 03:42:08 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.51 Driver Version: 375.51 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P4 Off | 0000:04:00.0 Off | 0 |
| N/A 50C P0 24W / 75W | 0MiB / 7606MiB | 2% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
  • kubelet启动参数添加--feature-gates Accelerators=true,重启kubelet。

4. 验证

部署yaml:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# vi demo-1.yaml
apiVersion: v1
kind: Pod
metadata:
name: cuda-demo-1
spec:
containers:
- name: cuda
image: nvidia/cuda
command: ["bash", "-c", "nvidia-smi && while true; do sleep 1000; done"]
imagePullPolicy: IfNotPresent
resources:
limits:
alpha.kubernetes.io/nvidia-gpu: "1"
volumeMounts:
- name: nvidia-driver
mountPath: /usr/local/nvidia
volumes:
- name: nvidia-driver
hostPath:
path: /var/lib/nvidia-docker/volumes/nvidia_driver/375.51

create后查看日志:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# kubectl  logs cuda-demo-1
Mon Jun 26 03:11:06 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.51 Driver Version: 375.51 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P4 Off | 0000:02:00.0 Off | 0 |
| N/A 37C P8 7W / 75W | 0MiB / 7606MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

表明给容器分配一个GPU卡成功。

在创建个cuda-demo-1资源量修改为2,查看Pod状态;

1
2
3
4
# kubectl get po
NAME READY STATUS RESTARTS AGE
cuda-demo-1 1/1 Running 0 6m
cuda-demo-2 0/1 Pending 0 3s

Pod处于Pending状态,通过describe,查看

1
2
3
4
5
6
# kubectl describe po cuda-demo-2
...
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
1m 27s 8 default-scheduler Warning FailedScheduling No nodes are available that match all of the following predicates:: Insufficient alpha.kubernetes.io/nvidia-gpu (3).

在我们的意料之内:因为GPU资源不足而调度失败。

删除cuda-demo-1后,cuda-demo-2正常running。

5. 补充

  • GPU配额值支持limits
  • GPU在容器间隔离,并不共享
  • 不支持GPU的部分调度,最小粒度是一张显卡
  • 同化了GPU硬件,对于用户统一

6. 不足

  1. 需要预先准备好驱动库。目前通过nvidia-docker帮我们完成。
  2. 容易产生资源碎片化问题。

7. 结论

相比kubernetes 1.3的GPU调度(一个主机只能支持一个GPU卡),1.6的调度有何很好的完善。基本概念都跟CPU、Memory一样,只不过GPU的最小粒度是卡。这在实际使用中会造成资源碎片问题,需要后续加强调度能力。基本满足目前的预言项目需求。

FAQ

装了NV驱动后,没有生成/dev/nvidia-uvm,导致k8s不识别GPU.
根据 https://www.zhihu.com/question/36588693 上的方法,运行一个Sample就行

http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#runfile-verifications

显示 Gitment 评论