Unscheduled System Reboot on Freenas 11.2 Server
Description
Problem/Justification
Impact
SmartDraw Connector
Katalon Manual Tests (BETA)
Activity
Alexander Motin February 11, 2020 at 7:19 PM
Without other information I can guess it may be the same false positive as was fixed in 11.3-RC2 as part of NAS-104334. Closing it for now.
FameWolf October 29, 2019 at 4:32 PM
Thank you. I'll continue to monitor it. I've also run some smart tests which they all appear to be passing. It, of course, isn't happening while I've been basically "sitting on it" to watch for cause. If necessary this can be closed.
Alexander Motin October 29, 2019 at 2:43 PM
In the provided debug I see number of kernel dumps with such a panic message:
The gptid in the last two are pointing to ada5 disk, previous 3 – to ada3. This panic supposed to tell that some I/O request haven't completed within 1000 seconds. But it looks somewhat suspicious to me, considering I see no disk errors/timeouts before the crashes, even though there are some errors in other times. Unfortunately I have no better explanations.
You may try to set vfs.zfs.deadman_enabled loader tunable to 0 and see whether you notice any other ill effects instead, or this is some kind of false alarms. You may also try to look on CPU utilization and disk queue depth in tome of before the crashes. With system dataset residing on a pool you should be able to see the pre-reboot graphs after reboot.
FameWolf October 27, 2019 at 6:28 PM
Thanks Anthony for replying. The server in question uses rsync to sync some folders to the backup freenas but the one rebooting is the master that does the push so I don't think it's that particular crash.
Anthony Takata (Tsaukpaetra) October 26, 2019 at 10:48 PM
I get those parity/crc errors all the time; I believe it's a faulty cable but doesn't usually cause a crash (the message is mostly informative).
Maybe it's related to the replication crash?
This almost certainly is going to be a hardware issue rather than a freenas issue. The server in question recently had to have a drive and the motherboard replaced due to a really bad storm in the area (at least I assume it was due to the storm). At that time it was also upgraded from Freenas 9.x to 11.2U6. It's a private server supporting 1 user so the hardware doesn't meet all the suggested requirements for a production server. At the time I replaced the motherboard I purchased 2 identical motherboards and cpu's then setup a "backup freenas server" that has been running fine in the same time period.
The system has had several unscheduled system reboots. No alerts are currently showing on the server. I put a box fan blowing directly on the server to attempt to eliminate any possible overheating of cpu/drives as a cause. Error messages were(sent to me in email):
freenas.local had an unscheduled system reboot.
The operating system successfully came back online at Fri Oct 25 07:59:22 2019.
freenas.local had an unscheduled system reboot.
The operating system successfully came back online at Wed Oct 23 13:41:08 2019.
freenas.local had an unscheduled system reboot.
The operating system successfully came back online at Thu Oct 17 11:14:27 2019.