`failover.status` not responding for several minutes

Description

Seen on http://10.238.238.199/ (API tests in TrueNAS SCALE - Cobia Jenkins plan).

Webui waits for answer to failover.status before showing login form. On machine in question, there is no response to this endpoint after minutes of waiting, resulting in no login form being shown.

 

Problem/Justification

None

Impact

None

Attachments

1
  • 06 Feb 2023, 06:15 PM

Activity

Show:

Bug Clerk February 21, 2023 at 1:06 PM

Bug Clerk February 21, 2023 at 12:12 PM

Caleb February 13, 2023 at 4:05 PM

Please use m40g3-137.dc1.ixsystems.net root/abcd1234 as it’s experiencing the problem currently. Also, this is happening often, it’s not a transient problem.

Caleb February 13, 2023 at 4:04 PM

It seems we might have a regression on Cobia with our websocket changes. (Or it could be something else).

From my client PC, I have a script that just connects to our public websocket endpoint and simply calls failover.status using the session id provided in the response after establishing a connection. Below are the results
Ran the script the first time and it responded within the same second

caleb@caleb-win10:/mnt/c/Users/caleb/Downloads$ python3 websocket-test.py CONNECTING TO: 'ws://m40g3-137.dc1.ixsystems.net:80/websocket' getting failover.status 10:38:41 response: {'id': '4278aa14-0eba-4da8-9898-5c137f6da374', 'msg': 'result', 'result': 'MASTER'} 10:38:41

ran the same thing again and look how long it took (~16 seconds)

CONNECTING TO: 'ws://m40g3-137.dc1.ixsystems.net:80/websocket' getting failover.status 10:38:57 response: {'id': '0d2aa347-9497-4247-a7e1-3f16ea0ff1a9', 'msg': 'result', 'result': 'MASTER'} 10:39:13

I've run it again shortly after that one responded and it hasn't even returned (been sitting for 15mins at this point)

python3 websocket-test.py CONNECTING TO: 'ws://m40g3-137.dc1.ixsystems.net:80/websocket' getting failover.status 10:39:40

Ievgen Stepanovych February 7, 2023 at 11:01 AM

It’s doing it again.

This time it stopped working as I was using the machine, i.e. it worked for ~30 minutes and then stopped.

Complete
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Original estimate

Time remaining

0m

Components

Affects versions

Priority

Katalon Platform

Created February 6, 2023 at 6:15 PM
Updated February 27, 2025 at 9:06 PM
Resolved February 21, 2023 at 1:06 PM