UI and SSH Become Unresponsive or Inaccessible but jails are up

Description

I upgraded from 11.2 to 11.3 and since i did this after the machine is started or restarted (between 24 and 96 hours later) the WebUI and SSH become unavailable but the jails are still available. So i have about 5 jails running but i SSH in to do things like upgrade the jail internals and use Web FrontEnd to manage my system. This has been consistent from upgrading the 11.3-Beta and still persists now i'm on the release version. As i can't log in to the machine once it gets "frozen" I can't really provide more stats; below is an example of my session into the machine where i'm using SSH to get in and then in another window me using terminal to try and get a response on Port 80 from the HTTP server.

Problem/Justification

None

Impact

None

SmartDraw Connector

Katalon Manual Tests (BETA)

Activity

Show:

CHRISTOPHER DAWES March 22, 2020 at 10:10 AM

Many thanks, I've changed it over to an SSD to run from; clean install with restored configuration (you guys are so awesome it's amazing it works so well!) and it's all up and running will see if the problem vanishes and many apologies if it does. Thanks again Christopher

Alexander Motin March 12, 2020 at 6:03 PM

, in your debug in several cases I see number of read/write errors on your da0 boot USB stick.  Depending what is actually affected there it may theoretically cause ZFS to get stuck. Since your jails/plugins reside on the data pool, they will likely be not affected.  I would confirm the assumption whether it is network related or not by logging in to system from console after it happened. If console is also unresponsive, then it is not a networking.  In such case I would try at least type Ctrl+T to see whether system will be able to report what is the active command waiting for.  I would guess something related to ZFS.

PS: And generally I would recommend to boot from something better then USB stick.  There are tons of cheap SATA and NVMe SSDs on the market, which are much better.

CHRISTOPHER DAWES March 11, 2020 at 11:59 PM

Hi there, i've uploaded a video  i've tried to do of the system where i've got an active ssh session into the server and then trying to ssh from another session, in the video i do an arp -a and try an ssh session the results are below. Having an active connection i then exited htop, i thought i'd ssh back to my machine to see what would happen and nothing; the process couldn't be started. It's like the system ran out of handles but how then could the jails keep running! I'm very bemused. Please see 

MacBook-Pro-2:~ chrisd$ arp -a

? (172.27.64.1) at e0:3f:49:e0:eb:c0 on en4 ifscope [ethernet]

freenas.local (172.27.72.7) at 68:5:ca:15:25:36 on en4 ifscope [ethernet]

? (172.27.72.24) at 2c:8:8c:d2:7c:8f on en4 ifscope [ethernet]

? (172.27.72.216) at 68:5:ca:15:25:36 on en4 ifscope [ethernet]

? (172.27.72.218) at 2:ff:60:5f:5f:8e on en4 ifscope [ethernet]

? (172.27.73.57) at 90:e1:7b:84:9d:fd on en4 ifscope [ethernet]

? (172.27.73.126) at 80:fa:5b:27:86:57 on en4 ifscope [ethernet]

? (172.27.73.157) at f8:6f:c1:24:e0:0 on en4 ifscope [ethernet]

? (172.27.73.186) at 40:cb:c0:c1:d8:c9 on en4 ifscope [ethernet]

? (172.27.73.203) at 8:f6:9c:69:c2:e5 on en4 ifscope [ethernet]

? (172.27.73.223) at 7c:61:66:58:4c:5d on en4 ifscope [ethernet]

? (172.27.73.246) at 90:dd:5d:d7:83:59 on en4 ifscope [ethernet]

? (172.27.75.1) at 9c:14:63:e9:a6:7e on en4 ifscope [ethernet]

? (224.0.0.251) at 1:0:5e:0:0:fb on en4 ifscope permanent [ethernet]

? (239.0.0.250) at 1:0:5e:0:0:fa on en4 ifscope permanent [ethernet]

? (239.255.255.250) at 1:0:5e:7f:ff:fa on en4 ifscope permanent [ethernet]

MacBook-Pro-2:~ chrisd$ ping 172.27.72.7

PING 172.27.72.7 (172.27.72.7): 56 data bytes

64 bytes from 172.27.72.7: icmp_seq=0 ttl=64 time=0.189 ms

64 bytes from 172.27.72.7: icmp_seq=1 ttl=64 time=0.225 ms

64 bytes from 172.27.72.7: icmp_seq=2 ttl=64 time=0.290 ms

^C

--- 172.27.72.7 ping statistics ---

3 packets transmitted, 3 packets received, 0.0% packet loss

round-trip min/avg/max/stddev = 0.189/0.235/0.290/0.042 ms

MacBook-Pro-2:~ chrisd$ netstat -an | grep 172.27.727

MacBook-Pro-2:~ chrisd$ netstat -an | grep 172.27.72.7

{{tcp4       0      0  172.27.72.10.61945     172.27.72.7.22         FIN_WAIT_2 }}

{{tcp4       0      0  172.27.72.10.61871     172.27.72.7.22         FIN_WAIT_2 }}

{{tcp4       0      0  172.27.72.10.61867     172.27.72.7.22         FIN_WAIT_2 }}

{{tcp4       0      0  172.27.72.10.61866     172.27.72.7.22         FIN_WAIT_2 }}

tcp4       0      0  172.27.72.10.54051     172.27.72.7.548        ESTABLISHED

tcp4       0      0  172.27.72.10.53978     172.27.72.7.22         ESTABLISHED

MacBook-Pro-2:~ chrisd$ ssh 172.27.72.7

 

Waqar Ahmed March 10, 2020 at 4:17 PM

can you also please open a SSH session and execute `htop` and keep it open to see which resources are being used and what happens before the session finally gives away ?

Waqar Ahmed March 10, 2020 at 3:09 PM

Can you please show `arp -a` from your client machine as well ?

User Configuration Error
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Components

Fix versions

Priority

More fields

Katalon Platform

Created February 18, 2020 at 2:23 PM
Updated July 1, 2022 at 4:50 PM
Resolved March 23, 2020 at 2:25 PM