Issues
cannot allocate nvidia t400 after reinstall
Description
Problem/Justification
Impact
relates to
Details
Details
Assignee

Reporter

Components
Fix versions
Affects versions
Priority

More fields
Time tracking
More fields
Time trackingKatalon Platform
Linked Test Cases, Katalon Defect Results, Katalon Studio Test Results
Katalon Platform
Activity

Bug Clerk May 13, 2024 at 6:41 PM
This issue has now been closed. Comments made after this point may not be viewed by the TrueNAS Teams. Please open a new issue if you have found a problem or need to re-engage with the TrueNAS Engineering Teams.

Andreas Faye-Lund May 13, 2024 at 5:52 PM
And it fails to deploy:
0/1 nodes are available: 1 Insufficient nvidia.com/gpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod..

Andreas Faye-Lund May 13, 2024 at 5:44 PM
After changing pcie slot to efi and enabeling srv-iov it seems to have worked

Andreas Faye-Lund May 13, 2024 at 5:27 PM
Was too quick after selecting assign 1 GPU it threw an error:
invalid choice: 1
Error: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/middlewared/job.py", line 469, in run await self.future File "/usr/lib/python3/dist-packages/middlewared/job.py", line 511, in __run_body rv = await self.method(*args) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/middlewared/service/crud_service.py", line 210, in nf rv = await func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/middlewared/schema/processor.py", line 47, in nf res = await f(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/middlewared/schema/processor.py", line 187, in nf return await func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/middlewared/plugins/chart_releases_linux/chart_release.py", line 546, in do_update config, context = await self.normalise_and_validate_values( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/middlewared/plugins/chart_releases_linux/chart_release.py", line 352, in normalise_and_validate_values dict_obj = await self.middleware.call( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1564, in call return await self._call( ^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1417, in _call return await methodobj(*prepared_call.args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/middlewared/plugins/chart_releases_linux/validation.py", line 54, in validate_values verrors.check() File "/usr/lib/python3/dist-packages/middlewared/service_exception.py", line 70, in check raise self middlewared.service_exception.ValidationErrors: [EINVAL] values.plexGPU.nvidia.com/gpu: Invalid choice: 1

Andreas Faye-Lund May 13, 2024 at 5:25 PM
Just fixed it. Seems like the app now dows a call out to ngc.download.nvidia.com, after opening that in my proxy I could assign GPU again.
After reinstalling Dragonfish to a new server(importing config from old one), I cannot assign my Nvidia T400 GPU to PLex anymore.
nvidis-smi sees the GPU, but in the app I can only choose to assign 0 Nvidia GPUs to app.
Host ID: 151e8303b3ad634eb8cb6b30c19c5df7bdd8537f99521869b8e7c7862770d3b8
Session ID: b9f3225c-b69d-e26e-ad67-ce5d5fcc6e00