upsd + lagg = communication errors until ups service restarted

Description

I have link aggregation (failover) configured on my TrueNAS 12.0 installation. What appears to happen is that networking has not yet fully started before upsd initializes, causing communication errors when it attempts to listen on localhost port 3493. Restarting the UPS services via GUI, running "service nut restart" in shell, or configuring "service nut restart" as a post-init script resolves the issue.

Logs show nut starting before networking:

Jul 30 23:08:18 freenas 1 2021-07-30T23:08:18.399917-04:00 freenas.workgroup root 2420 - - /etc/rc: WARNING: failed precmd routine for nut
Jul 30 23:08:18 freenas 1 2021-07-30T23:08:18.410016-04:00 freenas.workgroup upsmon 2427 - - UPS [ups@localhost:3493]: connect failed: Connection failure: Connection refused
Jul 30 23:08:18 freenas 1 2021-07-30T23:08:18.410031-04:00 freenas.workgroup upsmon 2427 - - Communications with UPS ups@localhost:3493 lost
Jul 30 23:08:18 freenas Security policy loaded: MAC/ntpd (mac_ntpd)
Jul 30 23:08:18 freenas proftpd[2436]: 127.0.0.1 - ProFTPD 1.3.6b (maint) (built Tue Jul 6 2021 20:21:58 UTC) standalone mode STARTUP
Jul 30 23:08:18 freenas kernel: bge0: link state changed to UP
Jul 30 23:08:18 freenas kernel: lagg0: link state changed to UP
Jul 30 23:08:19 freenas kernel: bge1: link state changed to UP
Jul 30 23:08:19 freenas kernel: bge2: link state changed to UP
Jul 30 23:08:19 freenas 1 2021-07-30T23:08:19.149488-04:00 freenas.workgroup collectd 2550 - - nut plugin: nut_connect: upscli_connect (localhost, 3493) failed: Connection failure: Connection refused
Jul 30 23:08:19 freenas kernel: bge3: link state changed to UP
Jul 30 23:08:23 freenas 1 2021-07-30T23:08:23.636886-04:00 freenas.workgroup upsmon 2427 - - UPS [ups@localhost:3493]: connect failed: Connection failure: Connection refused
Jul 30 23:08:23 freenas 1 2021-07-30T23:08:23.636909-04:00 freenas.workgroup upsmon 2427 - - UPS ups@localhost:3493 is unavailable
Jul 30 23:08:24 freenas syslog-ng[2757]: syslog-ng starting up; version='3.29.1'
Jul 30 23:08:24 freenas kernel: pid 2131 (syslog-ng), jid 0, uid 0: exited on signal 6 (core dumped)
Jul 30 23:08:28 freenas 1 2021-07-30T23:08:28.639294-04:00 freenas.workgroup upsmon 2427 - - UPS [ups@localhost:3493]: connect failed: Connection failure: Connection refused
Jul 30 23:08:33 freenas 1 2021-07-30T23:08:33.766635-04:00 freenas.workgroup upsmon 2427 - - UPS [ups@localhost:3493]: connect failed: Connection failure: Connection refused
Jul 30 23:08:39 freenas 1 2021-07-30T23:08:39.011275-04:00 freenas.workgroup upsmon 2427 - - UPS [ups@localhost:3493]: connect failed: Connection failure: Connection refused
Jul 30 23:08:39 freenas 1 2021-07-30T23:08:39.012546-04:00 freenas.workgroup collectd 2550 - - nut plugin: nut_connect: upscli_connect (localhost, 3493) failed: Connection failure: Connection refused
Jul 30 23:08:44 freenas 1 2021-07-30T23:08:44.039454-04:00 freenas.workgroup upsmon 2427 - - UPS [ups@localhost:3493]: connect failed: Connection failure: Connection refused
Jul 30 23:08:49 freenas 1 2021-07-30T23:08:49.040642-04:00 freenas.workgroup upsmon 2427 - - UPS [ups@localhost:3493]: connect failed: Connection failure: Connection refused
Jul 30 23:08:54 freenas 1 2021-07-30T23:08:54.059018-04:00 freenas.workgroup upsmon 2427 - - UPS [ups@localhost:3493]: connect failed: Connection failure: Connection refused
Jul 30 23:08:59 freenas 1 2021-07-30T23:08:59.078215-04:00 freenas.workgroup upsmon 2427 - - Communications with UPS ups@localhost:3493 established
Jul 30 23:09:30 freenas 1 2021-07-30T23:09:30.436326-04:00 freenas.workgroup upsmon 2426 - - upsmon parent: read

Best workaround seems to be setting it as a post-init script, but this does not clear the alert that communication with the UPS has been lost - restarting the UPS services does clear the alert but is obviously not automated.

Problem/Justification

None

Impact

None

Activity

Show:

ddrucker@mclean.harvard.edu November 17, 2023 at 1:47 AM

This issue is still present in TrueNAS 13.0-U4.

Bryan McKenna September 15, 2021 at 1:58 PM

Sorry for the delay, I should be available today from 330p-5p EST, or Friday from 9a-10a, 11a-12a, and 3p-4p EST. Additionally, any evening this week after 8:30p EST. If those times don't work, I can give you some availability for next week.

Waqar September 12, 2021 at 12:15 PM

 i would expect it to take 15-30 minutes  

Bryan McKenna September 11, 2021 at 2:09 AM

 sure, I'm East coast US. How long do we anticipate it will take to test? Length of time will likely determine my availability as during the day or in the evening.

Waqar September 10, 2021 at 3:48 PM

 can we please schedule a team viewer session to test a fix for the problem ? If yes, can you please advise a suitable time with timezone which works for you coming week ? Thanks!

Complete

Details

Assignee

Reporter

Labels

Time remaining

0m

Components

Fix versions

Affects versions

Priority

Katalon Platform

Created July 31, 2021 at 3:17 AM
Updated November 17, 2023 at 1:47 AM
Resolved September 10, 2021 at 4:09 PM