TrueNAS Scale beta: amdgpu missing firmware

Description

On TrueNAS-SCALE-22.12-BETA.2, amdgpu firmware files for the 6400xt (and likely 6500xt) are missing, although kernel support is present. The usual kernel messages appear on screen but amdgpu errors out and the text configuration menu is never presented on the monitor when the 6400xt is used to drive a display.

In this case, the 6400xt is the primary GPU on a TR1950X system with no integrated graphics. There's also a secondary nvidia gpu installed that has been isolated for passthrough.

The relevant message looks like this:
{{[ 16.848848] amdgpu 0000:43:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
[ 16.860166] amdgpu 0000:43:00.0: device [1002:743f] error status/mask=00100000/00000000
[ 16.860169] amdgpu 0000:43:00.0: [20] UnsupReq (First)
[ 16.860171] amdgpu 0000:43:00.0: AER: TLP Header: 40001001 4000000f 9d87f000 00000000
[ 16.885005] amdgpu 0000:43:00.0: Direct firmware load for amdgpu/beige_goby_sos.bin failed with error -2
[ 16.894762] amdgpu 0000:43:00.0: amdgpu: failed to init sos firmware
[ 16.901297] [drm:psp_sw_init [amdgpu]] ERROR Failed to load psp firmware!
[ 16.908616] [drm:amdgpu_device_init.cold [amdgpu]] ERROR sw_init of IP block <psp> failed -2
[ 16.917598] amdgpu 0000:43:00.0: amdgpu: amdgpu_device_ip_init failed
[ 16.924191] amdgpu 0000:43:00.0: amdgpu: Fatal error during GPU init
[ 16.930694] amdgpu 0000:43:00.0: amdgpu: amdgpu: finishing device.
[ 16.937513] amdgpu: probe of 0000:43:00.0 failed with error -2
[ 16.943692] [drm] amdgpu: ttm finalized}}

As an experiment I copied the beige_goby_* firmware files from another system to /lib/firmware/amdgpu and reloaded amdgpu. This seems to work as expected: the truenas text menu is displayed when the driver loads, other stuff like the hwmon or drm entries appear.

Problem/Justification

None

Impact

None

Activity

Show:

William Gryzbowski May 2, 2023 at 3:56 PM

Please reopen the bug if the issue still exists on 22.12.2.

Bonnie Follweiler November 9, 2022 at 2:53 PM

Thank you .

I have moved this ticket into our queue to review.

An engineering representative will update with any further questions or details in the near future.

Kyle Flores November 7, 2022 at 11:18 PM

I uploaded debug logs collected this morning, but since the system has the firmware blobs copied over, I don’t think the error above will appear in any recent dmesg logs.

Bonnie Follweiler November 7, 2022 at 3:42 PM

Kyle Flores October 28, 2022 at 7:00 AM

Also should have mentioned that this test system was upgraded from angelfish rather than installed with the bluefin beta.

Need additional information
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Components

Fix versions

Affects versions

Priority

More fields

Katalon Platform

Created October 28, 2022 at 6:50 AM
Updated May 8, 2024 at 6:08 PM
Resolved May 2, 2023 at 3:56 PM