Nvidia GPU is not shareable between apps
Description
Problem/Justification
None
Impact
None
Activity
Show:
Waqar Ahmed September 7, 2021 at 1:32 PM
@Stavros Kois please file a suggestion ticket for the feature requested, closing this issue.
Stavros Kois September 6, 2021 at 5:28 PM
Bug flag was my mistake indeed.
Subscribed to this upstream issue to follow it for more info!
Waqar Ahmed September 6, 2021 at 5:25 PM
@Stavros Kois i don't think this should be considered a bug as this is upstream behavior where GPUs can't be shared between pods. There is an open issue on kubernetes which you can track as well ( https://github.com/kubernetes/kubernetes/issues/52757 ). About the link you shared for the scheduler, i would advise you to create a suggestion ticket. However let's see if the newer nvidia driver version is available on the apt mirror we are using and we can update that.
Behaves as Intended
Pinned fields
Click on the next to a field label to start pinning.
Details
Details
Assignee
Triage Team
Triage TeamReporter
Stavros Kois
Stavros KoisLabels
Time remaining
0m
Fix versions
Affects versions
Priority
Katalon Platform
Linked Test Cases, Katalon Defect Results, Katalon Studio Test Results
Katalon Platform
Linked Test Cases, Katalon Defect Results, Katalon Studio Test Results
Created September 6, 2021 at 5:18 PM
Updated July 6, 2022 at 9:01 PM
Resolved September 7, 2021 at 1:31 PM
nvidia-device-plugin does not seem to allow gpu sharing between apps, u TC user, tried to deploy a second app with allocated gpu, but got
`0/1 nodes are available: 1 Insufficient nvidia.com/gpu.` as soon as the first app was stopped, the second deployed.
reading some docs around i found that (quote from k8s docs)
```
Containers (and Pods) do not share GPUs. There's no overcommitting of GPUs.
Each container can request one or more GPUs. It is not possible to request a fraction of a GPU.
```
https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/#v1-8-onwards
With a bit more search i found this that can enable sharing gpu
https://github.com/AliyunContainerService/gpushare-scheduler-extender
I'm fairly new with k8s, so i'm not quite sure how possible would that be to implement.
People who coming from a docker environment which allows sharing, this seems "limiting".
Also: noticed the driver is `460.xx`, while `470.xx` is released.