Issues
NVIDIA GPU not being used by Apps - Assigned VGA Arbitration by TrueNAS?
Description
Problem/Justification
Impact
Attachments
duplicates
Details
Details
Assignee
Reporter
Impact
Time remaining
Fix versions
Affects versions
Priority

Katalon Platform
Katalon Platform
Activity
Bug Clerk October 31, 2024 at 6:28 PM
This issue has now been closed. Comments made after this point may not be viewed by the TrueNAS Teams. Please open a new issue if you have found a problem or need to re-engage with the TrueNAS Engineering Teams.
Stavros Kois October 31, 2024 at 6:27 PM
@Michael Wesley Nice, I was expecting that to work.
I’ve mainly seen existing installation have this issue. You can track progress in the linked ticket if you want to (https://ixsystems.atlassian.net/browse/NAS-132086 )
I’ll close this one now, as its essentially a duplicate.
Thanks!
Michael Wesley October 31, 2024 at 5:41 PMEdited
Well that is extemely interesting.
I just created a new Plex container as requested, and now this shows up:
And according to nvidia-smi
Plex is now using my GPU.
root@truenas[~]# nvidia-smi
Thu Oct 31 18:38:35 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.127.05 Driver Version: 550.127.05 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce GTX 1660 ... Off | 00000000:01:00.0 Off | N/A |
| 40% 47C P2 38W / 125W | 374MiB / 6144MiB | 5% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 126273 C ...lib/plexmediaserver/Plex Transcoder 370MiB |
+-----------------------------------------------------------------------------------------+
I can’t believe it was that simple. Perhaps ElectricEel did fix my issue after all…
I’ll try with Immich as well, although that will take a lot longer as it will need to search through all my photos again.
I guess I just stick with this new Plex container and re-import all my media.
Stavros Kois October 31, 2024 at 5:22 PM
Yes please. You can try plex only, without uninstalling your current app.
Just make sure the current plex app its stopped when you start the second plex app
Michael Wesley October 31, 2024 at 5:19 PMEdited
Hi @Stavros Kois, I installed the Plex and Immich apps a few days ago after I updated from TrueNAS-CORE to TrueNAS-SCALE (Dragonfish). I updated to ElectricEel today in the hope that this would solve the problem. As such, the containers were ported over from Dragonfish to ElectricEel. I was experiencing the same issue in Dragonfish.
Do you still want me to try and do a fresh install on ElectricEel?
I can’t get my NVIDIA GTX 1660 working with Plex (transcoding) and Immich (machine learning). I recently upgraded from TrueNAS-Core to Scale.
I can confirm that Plex is not using my GPU for encoding, as my CPU usage spikes considerably when it’s transcoding, and there are no processes present in when I run
nvidia-smi
while transcoding.When I run the maching learning pods in Immich i get continual
ERROR Worker was sent code 139
which is a SIGSEGV memory violation error.I think the issue is that my GPU is being used by something and is not available to the system, as
[VGA Controller]
is listed after the GPU when I runlspci
– if I understand the meaning of that correctly.TrueNAS Scale Version:
ElectricEel-24.10.0
Plex Version:
1.0.24
Immich Version:
1.6.24
I do not have any displays connected.
I have followed this post which details adding the following code…
resources: gpus: nvidia_gpu_selection: '0000:07:00.0': use_gpu: true uuid: '' <<-- the problem use_all_gpus: false
… to the
user_config.yaml
file, located in the ixVolume volume, found at/mnt/.ix-apps/user_config.yaml
, and setting theIOMMU
andUUID
values correcty – which I have done.I also came across this post. However, I’m able to run the
nvidia-smi
command without errors.Interestingly, I don’t have any of the following files on my system:
/etc/modprobe.d/kvm.conf /etc/modprobe.d/nvidia.conf /etc/modprobe.d/vfio.conf
My system also doesn’t present me with any GPUs avaible for isolation, as shown in the screenshot further below.
Is anyone able to point me in the right direction as to what I should do?
— — — Additional Info — — —
nvidia-smi
Outputroot@truenas[~]# nvidia-smi Thu Oct 31 12:24:16 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.127.05 Driver Version: 550.127.05 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce GTX 1660 ... Off | 00000000:01:00.0 Off | N/A | | 28% 43C P0 N/A / 125W | 1MiB / 6144MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+
modprobe
Outputroot@truenas[~]# modprobe nvidia_current_drm modprobe: FATAL: Module nvidia_current_drm not found in directory /lib/modules/6.6.44-production+truenas
root@truenas[~]# modprobe nvidia-current modprobe: FATAL: Module nvidia-current not found in directory /lib/modules/6.6.44-production+truenas
lsmod
Outputroot@truenas[~]# lsmod | grep nvidia nvidia_uvm 4911104 0 nvidia_drm 118784 0 nvidia_modeset 1605632 1 nvidia_drm nvidia 60620800 2 nvidia_uvm,nvidia_modeset drm_kms_helper 249856 4 ast,nvidia_drm drm 757760 6 drm_kms_helper,ast,drm_shmem_helper,nvidia,nvidia_drm video 73728 1 nvidia_modeset
lspci
Outputroot@truenas[~]# lspci -v ... 01:00.0 VGA compatible controller: NVIDIA Corporation TU116 [GeForce GTX 1660 SUPER] (rev a1) (prog-if 00 [VGA controller]) Subsystem: NVIDIA Corporation TU116 [GeForce GTX 1660 SUPER] Flags: bus master, fast devsel, latency 0, IRQ 16, IOMMU group 1 Memory at f6000000 (32-bit, non-prefetchable) [size=16M] Memory at e0000000 (64-bit, prefetchable) [size=256M] Memory at f0000000 (64-bit, prefetchable) [size=32M] I/O ports at e000 [size=128] Expansion ROM at f7000000 [virtual] [disabled] [size=512K] Capabilities: [60] Power Management version 3 Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [78] Express Legacy Endpoint, MSI 00 Capabilities: [100] Virtual Channel Capabilities: [250] Latency Tolerance Reporting Capabilities: [258] L1 PM Substates Capabilities: [128] Power Budgeting <?> Capabilities: [420] Advanced Error Reporting Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?> Capabilities: [900] Secondary PCI Express Capabilities: [bb0] Physical Resizable BAR Kernel driver in use: nvidia Kernel modules: nouveau, nvidia_drm, nvidia
No GPUs available to Isolate
Application Settings
Plex Resources
Plex Transcoding Settings - No specific GPU available
These are all the
iommu
groups I currently have.root@truenas[~]# find /sys/kernel/iommu_groups/ -type l /sys/kernel/iommu_groups/7/devices/0000:00:1c.4 /sys/kernel/iommu_groups/5/devices/0000:00:1c.0 /sys/kernel/iommu_groups/3/devices/0000:00:19.0 /sys/kernel/iommu_groups/11/devices/0000:04:00.0 /sys/kernel/iommu_groups/1/devices/0000:00:01.0 /sys/kernel/iommu_groups/1/devices/0000:01:00.2 /sys/kernel/iommu_groups/1/devices/0000:01:00.0 /sys/kernel/iommu_groups/1/devices/0000:01:00.3 /sys/kernel/iommu_groups/1/devices/0000:01:00.1 /sys/kernel/iommu_groups/8/devices/0000:00:1d.0 /sys/kernel/iommu_groups/6/devices/0000:00:1c.1 /sys/kernel/iommu_groups/4/devices/0000:00:1a.0 /sys/kernel/iommu_groups/12/devices/0000:05:00.0 /sys/kernel/iommu_groups/2/devices/0000:00:14.0 /sys/kernel/iommu_groups/10/devices/0000:03:00.0 /sys/kernel/iommu_groups/10/devices/0000:02:00.0 /sys/kernel/iommu_groups/0/devices/0000:00:00.0 /sys/kernel/iommu_groups/9/devices/0000:00:1f.2 /sys/kernel/iommu_groups/9/devices/0000:00:1f.0 /sys/kernel/iommu_groups/9/devices/0000:00:1f.3 /sys/kernel/iommu_groups/9/devices/0000:00:1f.6
It seems that my GPU
0000:01:00.0
is the only device in it’s group, and each of it’s other components also has it’s own group.root@truenas[~]# lspci -Dnn | grep -i NVIDIA 0000:01:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU116 [GeForce GTX 1660 SUPER] [10de:21c4] (rev a1) 0000:01:00.1 Audio device [0403]: NVIDIA Corporation TU116 High Definition Audio Controller [10de:1aeb] (rev a1) 0000:01:00.2 USB controller [0c03]: NVIDIA Corporation TU116 USB 3.1 Host Controller [10de:1aec] (rev a1) 0000:01:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU116 USB Type-C UCSI Controller [10de:1aed] (rev a1)
I also ran the following command:
root@truenas[~]# dmesg | grep -i 'vga\|display\|nvidia' [ 0.211805] pci 0000:01:00.0: vgaarb: setting as boot VGA device [ 0.211805] pci 0000:01:00.0: vgaarb: bridge control possible [ 0.211805] pci 0000:01:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none [ 0.211805] pci 0000:03:00.0: vgaarb: setting as boot VGA device (overriding previous) [ 0.211805] pci 0000:03:00.0: vgaarb: bridge control possible [ 0.211805] pci 0000:03:00.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none [ 0.211805] vgaarb: loaded [ 0.694443] fb0: EFI VGA frame buffer device [ 12.876613] ast 0000:03:00.0: vgaarb: deactivate vga console [ 12.876769] ast 0000:03:00.0: [drm] Using analog VGA [ 12.907490] snd_hda_intel 0000:01:00.1: Handle vga_switcheroo audio client [ 12.962552] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card0/input6 [ 12.963921] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card0/input7 [ 12.963960] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card0/input8 [ 12.963992] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card0/input9 [ 13.529272] nvidia-nvlink: Nvlink Core is being initialized, major device number 241 [ 13.530190] nvidia 0000:01:00.0: enabling device (0000 -> 0003) [ 13.530340] nvidia 0000:01:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=none [ 13.576894] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 550.127.05 Tue Oct 8 03:22:07 UTC 2024 [ 13.618688] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 550.127.05 Tue Oct 8 02:56:05 UTC 2024 [ 13.626835] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver [ 13.626838] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 1 [ 112.662664] audit: type=1400 audit(1730371701.484:2): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=3350 comm="apparmor_parser" [ 112.663725] audit: type=1400 audit(1730371701.484:3): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=3350 comm="apparmor_parser" [ 164.737708] nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint. [ 164.794357] nvidia-uvm: Loaded the UVM driver, major device number 237.
This shows both my GPUs.
My other VGA device, which is my motherboard’s ASpeed AST, is identified as
ast 0000:03:00.0
.This device seems to be designated as the boot VGA, while it seems my NVIDIA GPU (located at
0000:01:00.0
) has also been configured with VGA arbitration – assuming that’s what the references toVGA
andframebuffer
mean.What do I need to do to get TrueNAS to release my NVIDIA GPU so that my apps can actually use it? Assuming that is the issue…