Multiple unexpected restarts
Description
Problem/Justification
Impact
Activity

Bug Clerk2 days ago
This issue has now been closed. Comments made after this point may not be viewed by the TrueNAS Teams. Please open a new issue if you have found a problem or need to re-engage with the TrueNAS Engineering Teams.

Bug Clerk2 days ago
Sorry but this does not look like a bug in TrueNAS. This looks like a culmination of issues specific to your install and/or hardware. Logs are littered with messages like these:
Mar 23 22:31:34 truenas kernel: get_swap_device: Bad swap file entry 1ffffffffffff
Mar 23 22:31:34 truenas kernel: get_swap_device: Bad swap file entry 1ffffffffffff
Mar 23 22:31:34 truenas systemd-coredump[3369607]: Process 3462 (syslog-ng) of user 0 dumped core.
Stack trace of thread 3369489:
#0 0x00007f5f70c24e2e n/a (n/a + 0x0)
ELF object binary architecture: AMD x86-64
Mar 23 22:31:34 truenas systemd-coredump[3371785]: Process 3369633 (syslog-ng) of user 0 dumped core.
Stack trace of thread 3369650:
#0 0x00007f7546396e2e n/a (n/a + 0x0)
ELF object binary architecture: AMD x86-64
Mar 23 22:31:34 truenas systemd-coredump[3371887]: Process 3371813 (syslog-ng) of user 0 dumped core.
Stack trace of thread 3371869:
#0 0x00007fb3d1b94e2e n/a (n/a + 0x0)
ELF object binary architecture: AMD x86-64
Mar 23 22:33:19 truenas kernel: usb 2-1.3: device not accepting address 5, error -71
Mar 23 22:33:19 truenas kernel: usb 2-1.3: device not accepting address 6, error -71
Mar 23 22:33:19 truenas kernel: usb 2-1.3: device not accepting address 7, error -71
Mar 23 22:33:19 truenas kernel: usb 2-1.3: device not accepting address 8, error -71
Mar 23 22:33:19 truenas kernel: usb 2-1-port3: unable to enumerate USB device
Apr 1 22:21:59 truenas kernel: INFO: task txg_sync:2933 blocked for more than 120 seconds.
Apr 1 22:21:59 truenas kernel: Tainted: P OE 6.12.15-production+truenas #1
Apr 1 22:21:59 truenas kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 1 22:24:00 truenas kernel: INFO: task txg_sync:2933 blocked for more than 241 seconds.
Apr 1 22:24:00 truenas kernel: Tainted: P OE 6.12.15-production+truenas #1
Apr 1 22:24:00 truenas kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 2 20:04:12 truenas systemd-coredump[2742932]: Process 1506 (asyncio_loop) of user 0 dumped core.
Stack trace of thread 3919:
#0 0x00000000005ee650 n/a (n/a + 0x0)
ELF object binary architecture: AMD x86-64
Apr 2 22:47:14 truenas systemd-coredump[2955435]: Process 11037 (python3) of user 0 dumped core.
Stack trace of thread 304:
#0 0x00007fd470fc5b00 n/a (n/a + 0x0)
ELF object binary architecture: AMD x86-64
This is not happening in a wide-spread fashion and we do not have the resources to investigate your specific environment and/or hardware. Suggest investigating your apps usage and also investigate the zpools hard drive health. The fact that txg_sync is “hung” is a bit alarming. It means the hard drives can’t keep up with the data being requested to be written and/or the hard drives are responding extremely slowly causing a cascading set of failures. Suggest starting simple and turn off/stop all vms/apps/containers etc and try to isolate the reboots with a particular workload.

Linzi Moorelast week
Thanks for your submission! This is in our queue to review now. An engineering representative will update with any further questions or details in the near future.

Andrew Walkerlast weekEdited
A userspace memory leak will typically not trigger a reboot. At most you’ll have userspace applications getting killed.

Bug Clerklast week
Thank you for submitting this TrueNAS Bug Report! So that we can quickly investigate your issue, please attach a Debug file and any other information related to this issue through our secure and private upload service below. Debug files can be generated in the UI by navigating to System -> Advanced -> Save Debug.
https://ixsystems.atlassian.net/servicedesk/customer/portal/15/group/37/create/153
Details
Assignee
TrueNAS Backend TriageTrueNAS Backend TriageReporter
Hello WorldHello WorldLabels
Components
Fix versions
Affects versions
Priority
Low
Details
Details
Assignee

Reporter

The TrueNAS has had multiple unexpected reboots in the last two weeks. Each time a reboot occurs, the TrueNAS is running at a moderate load (oddly, it seems to work fine under high load). The most recent reboot occurred while I was reading a file list over SMB.
Except for the last reboot, each previous reboot caused the Docker service to crash and required a pool reset to recover.
I looked at the memory information recorded by Netdata, and the system's memory usage increased dramatically with each reboot. Therefore, I suspect the following:
Memory leak.
ARC cache, Docker, Incus, and some scheduling policy conflicts with the system.