Applications service fails after upgrade if app requires GPU, seemingly due to missing Nvidia Drivers

Description

Steps to Reproduce
1. Set up a app with GPU pass-through (eg; Jellyfin) on a prior version of TrueNAS Scale (eg. 24.10 RC.2)
2. Upgrade system to later version
3. Observe as the Applications tab in the web-ui shows "Apps Service Pending" indefinitely.

Expected Result
Applications should show back up automatically after upgrade

Actual Result
All applications are missing as the Application service seemingly silently fails to start, giving of the unfortunate appearance all apps have been lost through the upgrade.

Workaround:
1. Open Configuration > Settings and un-check "Install Nvidia Drivers" and save.
2. Open Configuration > Settings again and re-check "Install Nvidia Drivers" and save, wait for drivers to install.
3. Click Configuration > Unset Pool
4. Click Configuration > Set Pool and re-select the pool your apps were installed on
5. Wait a few seconds and reload the page.

All apps should now return and/or re-deploying.

Session ID: df4ff14a-98fd-7ae6-67db-fd9f1052d08e

Problem/Justification

None

Impact

None

Activity

Show:

Bug Clerk November 1, 2024 at 3:59 PM

This issue has now been closed. Comments made after this point may not be viewed by the TrueNAS Teams. Please open a new issue if you have found a problem or need to re-engage with the TrueNAS Engineering Teams.

Bug Clerk November 1, 2024 at 3:55 PM

Bug Clerk October 31, 2024 at 10:18 AM

Vladimir Vinogradenko October 30, 2024 at 10:40 AM

I propose to change docker.lacks_nvidia_drivers to docker.nvidia_status that will return one of the following values:

{"status": "ABSENT"} - no nvidia cards are present

{"status": "NOT_INSTALLED"} - drivers are not installed

{"status": "INSTALLING"} - drivers are being installed

{"status": "INSTALL_ERROR", "error": "..."} - driver installation failed

{"status": "INSTALLED"} - drivers are installed

Ans also make a corresponding event source for that. Agree?

William Gryzbowski October 29, 2024 at 6:40 PM

Ok, I see now that you had network issue on boot which prevented install and why you needed the workaround.

Complete

Details

Assignee

Reporter

Labels

Original estimate

Time remaining

0m

Priority

Katalon Platform

Created October 29, 2024 at 4:46 PM
Updated November 2, 2024 at 1:59 PM
Resolved November 1, 2024 at 3:59 PM