Thanks for using the TrueNAS Community Edition issue tracker! TrueNAS Enterprise users receive direct support for their reports from our support portal.

cannot allocate nvidia t400 after reinstall

Description

After reinstalling Dragonfish to a new server(importing config from old one), I cannot assign my Nvidia T400 GPU to PLex anymore.
nvidis-smi sees the GPU, but in the app I can only choose to assign 0 Nvidia GPUs to app.

Host ID: 151e8303b3ad634eb8cb6b30c19c5df7bdd8537f99521869b8e7c7862770d3b8

Session ID: b9f3225c-b69d-e26e-ad67-ce5d5fcc6e00

Problem/Justification

None

Impact

None

relates to

Activity

Show:

Bug Clerk May 13, 2024 at 6:41 PM

This issue has now been closed. Comments made after this point may not be viewed by the TrueNAS Teams. Please open a new issue if you have found a problem or need to re-engage with the TrueNAS Engineering Teams.

Andreas Faye-Lund May 13, 2024 at 5:52 PM

And it fails to deploy:

0/1 nodes are available: 1 Insufficient nvidia.com/gpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod..

Andreas Faye-Lund May 13, 2024 at 5:44 PM

After changing pcie slot to efi and enabeling srv-iov it seems to have worked

Andreas Faye-Lund May 13, 2024 at 5:27 PM

Was too quick after selecting assign 1 GPU it threw an error:

invalid choice: 1

Error: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/middlewared/job.py", line 469, in run await self.future File "/usr/lib/python3/dist-packages/middlewared/job.py", line 511, in __run_body rv = await self.method(*args) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/middlewared/service/crud_service.py", line 210, in nf rv = await func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/middlewared/schema/processor.py", line 47, in nf res = await f(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/middlewared/schema/processor.py", line 187, in nf return await func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/middlewared/plugins/chart_releases_linux/chart_release.py", line 546, in do_update config, context = await self.normalise_and_validate_values( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/middlewared/plugins/chart_releases_linux/chart_release.py", line 352, in normalise_and_validate_values dict_obj = await self.middleware.call( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1564, in call return await self._call( ^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1417, in _call return await methodobj(*prepared_call.args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/middlewared/plugins/chart_releases_linux/validation.py", line 54, in validate_values verrors.check() File "/usr/lib/python3/dist-packages/middlewared/service_exception.py", line 70, in check raise self middlewared.service_exception.ValidationErrors: [EINVAL] values.plexGPU.nvidia.com/gpu: Invalid choice: 1

Andreas Faye-Lund May 13, 2024 at 5:25 PM

Just fixed it. Seems like the app now dows a call out to ngc.download.nvidia.com, after opening that in my proxy I could assign GPU again.

Duplicate
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Components

Fix versions

Priority

More fields

Katalon Platform

Created May 13, 2024 at 5:09 PM
Updated May 13, 2024 at 6:51 PM
Resolved May 13, 2024 at 6:41 PM