cannot allocate nvidia t400 after reinstall

Description

After reinstalling Dragonfish to a new server(importing config from old one), I cannot assign my Nvidia T400 GPU to PLex anymore.
nvidis-smi sees the GPU, but in the app I can only choose to assign 0 Nvidia GPUs to app.

Host ID: 151e8303b3ad634eb8cb6b30c19c5df7bdd8537f99521869b8e7c7862770d3b8

Session ID: b9f3225c-b69d-e26e-ad67-ce5d5fcc6e00

Problem/Justification

None

Impact

None

relates to

Activity

Show:

Bug Clerk May 13, 2024 at 6:41 PM

This issue has now been closed. Comments made after this point may not be viewed by the TrueNAS Teams. Please open a new issue if you have found a problem or need to re-engage with the TrueNAS Engineering Teams.

Andreas Faye-Lund May 13, 2024 at 5:52 PM

And it fails to deploy:

0/1 nodes are available: 1 Insufficient nvidia.com/gpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod..

Andreas Faye-Lund May 13, 2024 at 5:44 PM

After changing pcie slot to efi and enabeling srv-iov it seems to have worked

Andreas Faye-Lund May 13, 2024 at 5:27 PM

Was too quick after selecting assign 1 GPU it threw an error:

invalid choice: 1

Error: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/middlewared/job.py", line 469, in run await self.future File "/usr/lib/python3/dist-packages/middlewared/job.py", line 511, in __run_body rv = await self.method(*args) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/middlewared/service/crud_service.py", line 210, in nf rv = await func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/middlewared/schema/processor.py", line 47, in nf res = await f(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/middlewared/schema/processor.py", line 187, in nf return await func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/middlewared/plugins/chart_releases_linux/chart_release.py", line 546, in do_update config, context = await self.normalise_and_validate_values( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/middlewared/plugins/chart_releases_linux/chart_release.py", line 352, in normalise_and_validate_values dict_obj = await self.middleware.call( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1564, in call return await self._call( ^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1417, in _call return await methodobj(*prepared_call.args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/middlewared/plugins/chart_releases_linux/validation.py", line 54, in validate_values verrors.check() File "/usr/lib/python3/dist-packages/middlewared/service_exception.py", line 70, in check raise self middlewared.service_exception.ValidationErrors: [EINVAL] values.plexGPU.nvidia.com/gpu: Invalid choice: 1

Andreas Faye-Lund May 13, 2024 at 5:25 PM

Just fixed it. Seems like the app now dows a call out to ngc.download.nvidia.com, after opening that in my proxy I could assign GPU again.

Duplicate
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Components

Fix versions

Priority

More fields

Katalon Platform

Created May 13, 2024 at 5:09 PM
Updated May 13, 2024 at 6:51 PM
Resolved May 13, 2024 at 6:41 PM