Nvidia GPU is not shareable between apps

Description

nvidia-device-plugin does not seem to allow gpu sharing between apps, u TC user, tried to deploy a second app with allocated gpu, but got
`0/1 nodes are available: 1 Insufficient nvidia.com/gpu.` as soon as the first app was stopped, the second deployed.

reading some docs around i found that (quote from k8s docs)
```
Containers (and Pods) do not share GPUs. There's no overcommitting of GPUs.
Each container can request one or more GPUs. It is not possible to request a fraction of a GPU.
```
https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/#v1-8-onwards

With a bit more search i found this that can enable sharing gpu
https://github.com/AliyunContainerService/gpushare-scheduler-extender

I'm fairly new with k8s, so i'm not quite sure how possible would that be to implement.

People who coming from a docker environment which allows sharing, this seems "limiting".

Also: noticed the driver is `460.xx`, while `470.xx` is released.

Problem/Justification

None

Impact

None

Activity

Show:

Waqar Ahmed September 7, 2021 at 1:32 PM

please file a suggestion ticket for the feature requested, closing this issue.

Stavros Kois September 6, 2021 at 5:28 PM

Bug flag was my mistake indeed.
Subscribed to this upstream issue to follow it for more info!

Waqar Ahmed September 6, 2021 at 5:25 PM

i don't think this should be considered a bug as this is upstream behavior where GPUs can't be shared between pods. There is an open issue on kubernetes which you can track as well ( https://github.com/kubernetes/kubernetes/issues/52757 ). About the link you shared for the scheduler, i would advise you to create a suggestion ticket. However let's see if the newer nvidia driver version is available on the apt mirror we are using and we can update that.