Thanks for using the TrueNAS Community Edition issue tracker! TrueNAS Enterprise users receive direct support for their reports from our support portal.

Kernel Panic: "usercopy: Kernel memory exposure attempt detected from vmalloc 'no area'"

Description

Hello, today my installation of TrueNAS Scale 24.04.0 has experienced 3 kernel panics.

I have 2 stack traces that were saved to disk, the 3rd failed to save.

1st Kernel Trace

May 14 23:39:50 nas kernel: usercopy: Kernel memory exposure attempt detected from vmalloc 'no area' (offset 0, size 68793)! May 14 23:39:50 nas kernel: ------------[ cut here ]------------ May 14 23:39:50 nas kernel: kernel BUG at mm/usercopy.c:102! May 14 23:39:50 nas kernel: invalid opcode: 0000 [#1] PREEMPT SMP PTI May 14 23:39:50 nas kernel: CPU: 2 PID: 2200593 Comm: iou-wrk-2200592 Tainted: P W OE 6.6.20-production+truenas #1 May 14 23:39:50 nas kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 4.2023.08-4 02/15/2024 May 14 23:39:50 nas kernel: RIP: 0010:usercopy_abort+0x6c/0x80 May 14 23:39:50 nas kernel: Code: c4 98 51 48 c7 c2 5c e7 c4 98 41 52 48 c7 c7 d8 4a cc 98 48 0f 45 d6 48 c7 c6 cc c3 c4 98 48 89 c1 49 0f 45 f3 e8 64 6d d6 ff <0f> 0b 49 c7 c1 d5 33 c6 98 4d 89 ca 4d 89 c8 eb a8 0f 1f 00 90 90 May 14 23:39:50 nas kernel: RSP: 0018:ffffaede01473aa0 EFLAGS: 00010246 May 14 23:39:50 nas kernel: RAX: 0000000000000060 RBX: ffffaedfc91a89cc RCX: 0000000000000000 May 14 23:39:50 nas kernel: RDX: 0000000000000000 RSI: ffffa0db1f2a13c0 RDI: ffffa0db1f2a13c0 May 14 23:39:50 nas kernel: RBP: 0000000000010cb9 R08: 0000000000000000 R09: ffffaede01473940 May 14 23:39:50 nas kernel: R10: 0000000000000003 R11: ffffa0db7fedb570 R12: 0000000000000001 May 14 23:39:50 nas kernel: R13: ffffaedfc91b9685 R14: 0000000000000000 R15: ffffa0c8e7579200 May 14 23:39:50 nas kernel: FS: 00007f34595d8ac0(0000) GS:ffffa0db1f280000(0000) knlGS:0000000000000000 May 14 23:39:50 nas kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 14 23:39:50 nas kernel: CR2: 00007f3455c0a000 CR3: 00000004aaa82001 CR4: 0000000000370ee0 May 14 23:39:50 nas kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 May 14 23:39:50 nas kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 May 14 23:39:50 nas kernel: Call Trace: May 14 23:39:50 nas kernel: <TASK> May 14 23:39:50 nas kernel: ? die+0x36/0x90 May 14 23:39:50 nas kernel: ? do_trap+0xda/0x100 May 14 23:39:50 nas kernel: ? usercopy_abort+0x6c/0x80 May 14 23:39:50 nas kernel: ? do_error_trap+0x6a/0x90 May 14 23:39:50 nas kernel: ? usercopy_abort+0x6c/0x80 May 14 23:39:50 nas kernel: ? exc_invalid_op+0x50/0x70 May 14 23:39:50 nas kernel: ? usercopy_abort+0x6c/0x80 May 14 23:39:50 nas kernel: ? asm_exc_invalid_op+0x1a/0x20 May 14 23:39:50 nas kernel: ? usercopy_abort+0x6c/0x80 May 14 23:39:50 nas kernel: __check_object_size+0x2b7/0x2c0 May 14 23:39:50 nas kernel: zfs_uiomove_iter+0x5b/0xe0 [zfs] May 14 23:39:50 nas kernel: dmu_read_uio_dnode+0xc8/0x110 [zfs] May 14 23:39:50 nas kernel: dmu_read_uio_dbuf+0x46/0x60 [zfs] May 14 23:39:50 nas kernel: zfs_read+0x123/0x2f0 [zfs] May 14 23:39:50 nas kernel: zpl_iter_read+0xc0/0x130 [zfs] May 14 23:39:50 nas kernel: io_read+0xec/0x510 May 14 23:39:50 nas kernel: io_issue_sqe+0x63/0x3d0 May 14 23:39:50 nas kernel: io_wq_submit_work+0x8c/0x2d0 May 14 23:39:50 nas kernel: io_worker_handle_work+0x15c/0x5b0 May 14 23:39:50 nas kernel: io_wq_worker+0x10c/0x3b0 May 14 23:39:50 nas kernel: ? __pfx_io_wq_worker+0x10/0x10 May 14 23:39:50 nas kernel: ret_from_fork+0x34/0x50 May 14 23:39:50 nas kernel: ? __pfx_io_wq_worker+0x10/0x10 May 14 23:39:50 nas kernel: ret_from_fork_asm+0x1b/0x30 May 14 23:39:50 nas kernel: </TASK> May 14 23:39:50 nas kernel: Modules linked in: nf_conntrack_netlink(E) veth(E) nft_log(E) nft_limit(E) xt_limit(E) xt_NFLOG(E) nfnetlink_log(E) xt_physdev(E) xt_multiport(E) ip_vs_rr(E) dummy(E) ip_set_hash_ipport(E) xt_ipvs(E) xt_set(E) ip_vs(E) ip_set_hash_ip(E) ip_set_hash_net(E) ip_set(E) xt_nat(E) xt_addrtype(E) ipt_REJECT(E) nf_reject_ipv4(E) xt_tcpudp(E) xt_MASQUERADE(E) nft_chain_nat(E) xt_mark(E) xt_conntrack(E) xt_comment(E) nft_compat(E) nf_tables(E) iptable_filter(E) iptable_nat(E) nf_nat(E) br_netfilter(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) overlay(E) nfnetlink(E) rpcsec_gss_krb5(E) scst_vdisk(OE) isert_scst(OE) iscsi_scst(OE) scst(OE) rdma_cm(E) iw_cm(E) ib_cm(E) dlm(E) nvme_fabrics(E) binfmt_misc(E) bridge(E) stp(E) llc(E) ntb_netdev(E) ntb_transport(E) ntb_split(E) ntb(E) ioatdma(E) dca(E) essiv(E) authenc(E) crypto_null(E) dm_crypt(E) ib_core(E) intel_rapl_msr(E) intel_rapl_common(E) intel_uncore_frequency_common(E) kvm_intel(E) kvm(E) irqbypass(E) ghash_clmulni_intel(E) sha512_ssse3(E) May 14 23:39:50 nas kernel: sha256_ssse3(E) sha1_ssse3(E) aesni_intel(E) crypto_simd(E) cryptd(E) rapl(E) snd_hda_intel(E) snd_intel_dspcfg(E) snd_hda_codec(E) snd_hda_core(E) snd_hwdep(E) iTCO_wdt(E) bochs(E) snd_pcm(E) intel_pmc_bxt(E) drm_vram_helper(E) pcspkr(E) drm_ttm_helper(E) snd_timer(E) iTCO_vendor_support(E) ttm(E) watchdog(E) virtio_console(E) snd(E) drm_kms_helper(E) soundcore(E) evdev(E) joydev(E) button(E) serio_raw(E) sg(E) nfsd(E) nfs_acl(E) lockd(E)auth_rpcgss(E) grace(E) loop(E) efi_pstore(E) drm(E) dm_mod(E) configfs(E) sunrpc(E) qemu_fw_cfg(E) ip_tables(E) x_tables(E) autofs4(E) zfs(POE) spl(OE) efivarfs(E) raid10(E) raid456(E) async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E) async_tx(E) xor(E) raid6_pq(E) libcrc32c(E) crc32c_generic(E) raid1(E) raid0(E) multipath(E) linear(E) md_mod(E) ses(E) sr_mod(E) enclosure(E) hid_generic(E) cdrom(E) usbhid(E) hid(E) nvme(E) ahci(E) ahciem(E) nvme_core(E) sd_mod(E) libahci(E) t10_pi(E) mpt3sas(E) uhci_hcd(E) ehci_pci(E) crc64_rocksoft(E) raid_class(E) May 14 23:39:50 nas kernel: libata(E) crc64(E) scsi_transport_sas(E) ehci_hcd(E) virtio_scsi(E) crc_t10dif(E) virtio_net(E) crct10dif_generic(E) crc32_pclmul(E) net_failover(E) usbcore(E) i2c_i801(E) crct10dif_pclmul(E) crc32c_intel(E)scsi_mod(E) psmouse(E) lpc_ich(E) failover(E) i2c_smbus(E) crct10dif_common(E) scsi_common(E) usb_common(E) May 14 23:39:50 nas kernel: ---[ end trace 0000000000000000 ]---

2nd Kernel Trace. This one appears to have not been completely saved to disk

May 15 12:28:14 nas kernel: usercopy: Kernel memory exposure attempt detected from vmalloc 'no area' (offset 0, size 47731)! May 15 12:28:14 nas kernel: ------------[ cut here ]------------ May 15 12:28:14 nas kernel: kernel BUG at mm/usercopy.c:102! May 15 12:28:14 nas kernel: invalid opcode: 0000 [#1] PREEMPT SMP PTI May 15 12:28:14 nas kernel: CPU: 35 PID: 1280051 Comm: iou-wrk-1280048 Tainted: P OE 6.6.20-production+truenas #1 May 15 12:28:14 nas kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 4.2023.08-4 02/15/2024 May 15 12:28:14 nas kernel: RIP: 0010:usercopy_abort+0x6c/0x80 May 15 12:28:14 nas kernel: Code: c4 bc 51 48 c7 c2 5c e7 c4 bc 41 52 48 c7 c7 d8 4a cc bc 48 0f 45 d6 48 c7 c6 cc c3 c4 bc 48 89 c1 49 0f 45 f3 e8 64 6d d6 ff <0f> 0b 49 c7 c1 d5 33 c6 bc 4d 89 ca 4d 89 c8 eb a8 0f 1f 00 90 90 May 15 12:28:14 nas kernel: RSP: 0018:ffffb3ff85877aa0 EFLAGS: 00010246 May 15 12:28:14 nas kernel: RAX: 0000000000000060 RBX: ffffb4009a588c35 RCX: 0000000000000000 May 15 12:28:14 nas kernel: RDX: 0000000000000000 RSI: ffffa0c55fae13c0 RDI: ffffa0c55fae13c0 May 15 12:28:14 nas kernel: RBP: 000000000000ba73 R08: 0000000000000000 R09: ffffb3ff85877940 May 15 12:28:14 nas kernel: R10: 0000000000000003 R11: ffffa0c5bff20f48 R12: 0000000000000001 May 15 12:28:14 nas kernel: R13: ffffb4009a5946a8 R14: 0000000000000000 R15: ffffa0af3f3b1e00 May 15 12:28:14 nas kernel: FS: 00007fd245f4bac0(0000) GS:ffffa0c55fac0000(0000) knlGS:0000000000000000 May 15 12:28:14 nas kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 15 12:28:14 nas kernel: CR2: 00007fd242bb0000 CR3: 00000002cb060005 CR4: 0000000000370ee0 May 15 12:28:14 nas kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 May 15 12:28:14 nas kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 May 15 12:28:14 nas kernel: Call Trace: May 15 12:28:14 nas kernel: <TASK> May 15 12:28:14 nas kernel: ? die+0x36/0x90 May 15 12:28:14 nas kernel: ? do_trap+0xda/0x100 May 15 12:28:14 nas kernel: ? usercopy_abort+0x6c/0x80 May 15 12:28:14 nas kernel: ? do_error_trap+0x6a/0x90 May 15 12:28:14 nas kernel: ? usercopy_abort+0x6c/0x80 May 15 12:28:14 nas kernel: ? exc_invalid_op+0x50/0x70 May 15 12:28:14 nas kernel: ? usercopy_abort+0x6c/0x80 May 15 12:28:14 nas kernel: ? asm_exc_invalid_op+0x1a/0x20 May 15 12:28:14 nas kernel: ? usercopy_abort+0x6c/0x80 May 15 12:28:14 nas kernel: __check_object_size+0x2b7/0x2c0 May 15 12:28:14 nas kernel: zfs_uiomove_iter+0x5b/0xe0 [zfs] May 15 12:28:14 nas kernel: dmu_read_uio_dnode+0xc8/0x110 [zfs] May 15 12:28:14 nas kernel: dmu_read_uio_dbuf+0x46/0x60 [zfs] May 15 12:28:14 nas kernel: zfs_read+0x123/0x2f0 [zfs] May 15 12:28:14 nas kernel: zpl_iter_read+0xc0/0x130 [zfs] May 15 12:28:14 nas kernel: io_read+0xec/0x510 May 15 12:28:14 nas kernel: io_issue_sqe+0x63/0x3d0 May 15 12:28:14 nas kernel: io_wq_submit_work+0x8c/0x2d0 May 15 12:28:14 nas kernel: io_worker_handle_work+0x15c/0x5b0 May 15 12:28:14 nas kernel: io_wq_worker+0x10c/0x3b0 May 15 12:28:14 nas kernel: ? __pfx_io_wq_worker+0x10/0x10 May 15 12:28:14 nas kernel: ? _raw_spin_unlock+0xe/0x30 May 15 12:28:14 nas kernel: ? __pfx_io_wq_worker+0x10/0x10 May 15 12:28:14 nas kernel: ret_from_fork+0x34/0x50 May 15 12:28:14 nas kernel: ? __pfx_io_wq_worker+0x10/0x10 May 15 12:28:14 nas kernel: ret_from_fork_asm+0x1b/0x30 May 15 12:28:14 nas kernel: </TASK> May 15 12:28:14 nas kernel: Modules linked in: nf_conntrack_netlink(E) veth(E) nft_log(E) nft_limit(E) xt_limit(E) xt_NFLOG(E) nfnetlink_log(E) xt_physdev(E) xt_multiport(E) xt_addrtype(E) ip_vs_rr(E) dummy(E) ipt_REJECT(E) nf_reject_ipv4(E) xt_tcpudp(E) ip_set_hash_ipport(E) xt_nat(E) xt_ipvs(E) ip_vs(E) xt_set(E) ip_set_hash_ip(E) ip_set_hash_net(E) ip_set(E) xt_MASQUERADE(E) nft_chain_nat(E) xt_mark(E) xt_conntrack(E) xt_comment(E) nft_compat(E) nf_tables(E) nfnetlink(E) iptable_filter(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) overlay(E) br_netfilter(E) scst_vdisk(OE) isert_scst(OE) iscsi_scst(OE) scst(OE) rdma_cm(E) iw_cm(E) ib_cm(E) dlm(E) nvme_fabrics(E) binfmt_misc(E) bridge(E) stp(E) llc(E) ntb_netdev(E) ntb_transport(E) ntb_split(E) ntb(E) ioatdma(E) dca(E) essiv(E) authenc(E) crypto_null(E) dm_crypt(E) ib_core(E) intel_rapl_msr(E) intel_rapl_common(E) intel_uncore_frequency_common(E)kvm_intel(E) kvm(E) irqbypass(E) ghash_clmulni_intel(E) sha512_ssse3(E) sha256_ssse3(E) sha1_ssse3(E)

It’s worth noting that I’ve seen a couple other crashes since upgrading to 24.04.0, but today is the first time I’ve gotten a viable stack trace from syslog-ng to submit a bug report.

 

 

Debug file can be found here: https://ixsystems.atlassian.net/servicedesk/customer/portal/15/TPF-3040

Since this sounds like a memory problem, maybe it’s related to memory fragmentation? I know that has been an issue in the past.
This 24 hour chart from Netdata provide an interesting view on memory fragmentation leading up to the crashes.

 

Screenshot_20240515_165530.png

Problem/Justification

None

Impact

None

Attachments

1
  • 16 May 2024, 12:03 AM

Activity

Show:

Bug Clerk August 5, 2024 at 4:26 PM

This issue has now been closed. Comments made after this point may not be viewed by the TrueNAS Teams. Please open a new issue if you have found a problem or need to re-engage with the TrueNAS Engineering Teams.

Max Goodell June 14, 2024 at 4:14 AM

I was about to call this solved, but I had another crash about an hour ago. Didn’t save a trace to the logs unfortunately. I started logging using serial so I should be able to grab the next one.

Alexander Motin June 10, 2024 at 2:07 PM

Any updates?

Max Goodell May 24, 2024 at 4:05 PM

Thanks for the tips ! Now that I’m more familiar with ZFS I’ll definitely use smaller vdevs next time I rebuild, though that’ll have to wait until I need more hardware since I’ll need to shuffle 150tb of data around.

As far as the failed disk, taking care of that has been on my to-do list for a long time. My Supermicro back-plane has dry solder joints on one of the SAS receptacle connectors. Fixing it will need some significant downtime for me to pull the rig apart, fix it, and make sure the other connectors aren’t defective too.

I did have to revert to 23.10.2 due to the stability issues, and everything’s been stable since. I’ll update to 24.04.1 when it releases and let you know how it goes.

Alexander Motin May 24, 2024 at 2:43 PM

Those panics seem to be caused by NULL de-reference, which should not be related to mentioned memory usage/fragmentation. I saw similar ones caused by insufficient error handling in dbuf read path of ZFS. Some of those were actually fixed in 24.04.0, so I am surprised to see them again here. Though next week we should release 24.04.1 update, which I would ask you to try, since it includes some more fixes to the area.

Unrelated to the panics, it seems one of the disks in your pool has failed. Also your 24-wide RAIDZ3 topology looks sub-optimal for you configuration. So wide vdevs may have sense only for very large recordsize’s, while most of your datasets have recordsize=128K. You could get much better performance and same or better space efficiency if you used several smaller vdevs instead.

Cannot Reproduce

Details

Assignee

Reporter

Labels

Impact

High

Components

Fix versions

Priority

More fields

Katalon Platform

Created May 16, 2024 at 12:03 AM
Updated August 5, 2024 at 4:26 PM
Resolved August 5, 2024 at 4:26 PM

Flag notifications