Thanks for using the TrueNAS Community Edition issue tracker! TrueNAS Enterprise users receive direct support for their reports from our support portal.

Manual reboot of active controller via SSH breaks HA on SCALE

Description

I attempted to failover by running reboot in SSH session on active controller.



Now passive controller is froze in this state and HA never becomes healthy again.

Problem/Justification

None

Impact

None

Attachments

1
  • 21 Mar 2023, 09:00 PM

Activity

Show:

Bug ClerkMarch 23, 2023 at 1:31 PM

Bug ClerkMarch 23, 2023 at 1:31 PM

Automation for JiraMarch 23, 2023 at 1:31 PM

This issue has now been closed. Comments made after this point may not be viewed by the TrueNAS Teams. Please open a new issue if you have found a problem or need to re-engage with the TrueNAS Engineering Teams.

Bug ClerkMarch 22, 2023 at 8:16 PM

CalebMarch 22, 2023 at 12:32 PM

could you help me investigate this one please? Here is what I suspect.

1. fenced is running
2. reboot invokes systemd and does it’s various things related to shutdown.target
3. keepalived systemd unit gets stopped which sends BACKUP event
4. we start processing failover event and being to export pools
5. at same time systemd stopping of services continues and middlewared process gets stopped (more than likely killed)
6. because middlewared gets killed, the processing of the failover event stops
7. fenced is also stopped at some point in this process
8. by this point, other controller has received MASTER event and has started fenced
9. other controller detects that scsi reservation keys did NOT change on the disks (because fenced was stopped/killed during reboot process)
10. controller reserves the disks while export of the zpools never completes
11. everything gets hung

I’m not sure what we can do in this scenario….i looked up systemd shutdown.target and “conflicts” arguments that can be added to systemd unit service files but just curious if you could help me investigate it.

Complete

Details

Assignee

Reporter

Impact

High

Components

Priority

More fields

Katalon Platform

Created March 21, 2023 at 9:00 PM
Updated March 23, 2023 at 1:45 PM
Resolved March 23, 2023 at 1:45 PM

Flag notifications