UI and SSH Become Unresponsive or Inaccessible but jails are up
Description
Problem/Justification
Impact
SmartDraw Connector
Katalon Manual Tests (BETA)
Activity

CHRISTOPHER DAWES March 22, 2020 at 10:10 AM
Many thanks, I've changed it over to an SSD to run from; clean install with restored configuration (you guys are so awesome it's amazing it works so well!) and it's all up and running will see if the problem vanishes and many apologies if it does. Thanks again Christopher

Alexander Motin March 12, 2020 at 6:03 PM
, in your debug in several cases I see number of read/write errors on your da0 boot USB stick. Depending what is actually affected there it may theoretically cause ZFS to get stuck. Since your jails/plugins reside on the data pool, they will likely be not affected. I would confirm the assumption whether it is network related or not by logging in to system from console after it happened. If console is also unresponsive, then it is not a networking. In such case I would try at least type Ctrl+T to see whether system will be able to report what is the active command waiting for. I would guess something related to ZFS.
PS: And generally I would recommend to boot from something better then USB stick. There are tons of cheap SATA and NVMe SSDs on the market, which are much better.

CHRISTOPHER DAWES March 11, 2020 at 11:59 PM
Hi there, i've uploaded a video i've tried to do of the system where i've got an active ssh session into the server and then trying to ssh from another session, in the video i do an arp -a and try an ssh session the results are below. Having an active connection i then exited htop, i thought i'd ssh back to my machine to see what would happen and nothing; the process couldn't be started. It's like the system ran out of handles but how then could the jails keep running! I'm very bemused. Please see
MacBook-Pro-2:~ chrisd$ arp -a
? (172.27.64.1) at e0:3f:49:e0:eb:c0 on en4 ifscope [ethernet]
freenas.local (172.27.72.7) at 68:5:ca:15:25:36 on en4 ifscope [ethernet]
? (172.27.72.24) at 2c:8:8c:d2:7c:8f on en4 ifscope [ethernet]
? (172.27.72.216) at 68:5:ca:15:25:36 on en4 ifscope [ethernet]
? (172.27.72.218) at 2:ff:60:5f:5f:8e on en4 ifscope [ethernet]
? (172.27.73.57) at 90:e1:7b:84:9d:fd on en4 ifscope [ethernet]
? (172.27.73.126) at 80:fa:5b:27:86:57 on en4 ifscope [ethernet]
? (172.27.73.157) at f8:6f:c1:24:e0:0 on en4 ifscope [ethernet]
? (172.27.73.186) at 40:cb:c0:c1:d8:c9 on en4 ifscope [ethernet]
? (172.27.73.203) at 8:f6:9c:69:c2:e5 on en4 ifscope [ethernet]
? (172.27.73.223) at 7c:61:66:58:4c:5d on en4 ifscope [ethernet]
? (172.27.73.246) at 90:dd:5d:d7:83:59 on en4 ifscope [ethernet]
? (172.27.75.1) at 9c:14:63:e9:a6:7e on en4 ifscope [ethernet]
? (224.0.0.251) at 1:0:5e:0:0:fb on en4 ifscope permanent [ethernet]
? (239.0.0.250) at 1:0:5e:0:0:fa on en4 ifscope permanent [ethernet]
? (239.255.255.250) at 1:0:5e:7f:ff:fa on en4 ifscope permanent [ethernet]
MacBook-Pro-2:~ chrisd$ ping 172.27.72.7
PING 172.27.72.7 (172.27.72.7): 56 data bytes
64 bytes from 172.27.72.7: icmp_seq=0 ttl=64 time=0.189 ms
64 bytes from 172.27.72.7: icmp_seq=1 ttl=64 time=0.225 ms
64 bytes from 172.27.72.7: icmp_seq=2 ttl=64 time=0.290 ms
^C
--- 172.27.72.7 ping statistics ---
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.189/0.235/0.290/0.042 ms
MacBook-Pro-2:~ chrisd$ netstat -an | grep 172.27.727
MacBook-Pro-2:~ chrisd$ netstat -an | grep 172.27.72.7
{{tcp4 0 0 172.27.72.10.61945 172.27.72.7.22 FIN_WAIT_2 }}
{{tcp4 0 0 172.27.72.10.61871 172.27.72.7.22 FIN_WAIT_2 }}
{{tcp4 0 0 172.27.72.10.61867 172.27.72.7.22 FIN_WAIT_2 }}
{{tcp4 0 0 172.27.72.10.61866 172.27.72.7.22 FIN_WAIT_2 }}
tcp4 0 0 172.27.72.10.54051 172.27.72.7.548 ESTABLISHED
tcp4 0 0 172.27.72.10.53978 172.27.72.7.22 ESTABLISHED
MacBook-Pro-2:~ chrisd$ ssh 172.27.72.7

Waqar Ahmed March 10, 2020 at 4:17 PM
can you also please open a SSH session and execute `htop` and keep it open to see which resources are being used and what happens before the session finally gives away ?

Waqar Ahmed March 10, 2020 at 3:09 PM
Can you please show `arp -a` from your client machine as well ?
Details
Details
Assignee

Reporter

I upgraded from 11.2 to 11.3 and since i did this after the machine is started or restarted (between 24 and 96 hours later) the WebUI and SSH become unavailable but the jails are still available. So i have about 5 jails running but i SSH in to do things like upgrade the jail internals and use Web FrontEnd to manage my system. This has been consistent from upgrading the 11.3-Beta and still persists now i'm on the release version. As i can't log in to the machine once it gets "frozen" I can't really provide more stats; below is an example of my session into the machine where i'm using SSH to get in and then in another window me using terminal to try and get a response on Port 80 from the HTTP server.