Unable to get high uptime due to RAM problems with VMs and cache
Description
Problem/Justification
Impact
Activity
Umer Saleem August 18, 2023 at 12:24 PM
Hi @CZŁOWIEK, do we have any update on this ticket? Are you still facing this issue?
Umer Saleem August 16, 2023 at 3:14 PMEdited
Hello@CZŁOWIEK, it seems that you upgraded to TrueNAS SCALE from TrueNAS CORE. Is that correct? CORE is based on FreeBSD, zfs_arc_max is a sysctl there and for CORE you can update it using sysctl. By default it should be the larger of all_system_memory - 1GB
and 5/8 × all_system_memory
.
But SCALE is based on Linux, and zfs_arc_max
is kernel module parameter in SCALE, located in /sys/module/zfs/parameters/zfs_arc_max
. By default, it should consume one half of the system memory. The default values are used if zfs_arc_max
is set 0. Can you please share the output of cat /sys/module/zfs/parameters/zfs_arc_max
?
To update the zfs_arc_max
, you can try echo NEWVALUE >> /sys/module/zfs/parameters/zfs_arc_max
.
Also, when you start a VM, there is a service in middleware that should reduce the memory consumed by ARC. I see it working in logs for one of your VM:
log/middlewared.log.3:63027:[2022/08/09 07:27:24] (INFO) VMService.__set_guest_vmemory():1217 - ===> Setting ARC FROM: 30688774528 TO: 28541290880
log/middlewared.log.3:63038:[2022/08/09 07:27:32] (INFO) VMService.__set_guest_vmemory():1217 - ===> Setting ARC FROM: 28541290880 TO: 25320065408
log/middlewared.log.3:63039:[2022/08/09 07:27:32] (INFO) VMService.__set_guest_vmemory():1217 - ===> Setting ARC FROM: 25320065408 TO: 18877614464
Also, setting following parameters from sysctl is not supported on Linux, please look for /sys/module/zfs/parameters/PARAMETER
:
log/syslog:14577:Jul 8 13:30:48 NAS2 systemd-sysctl[2807]: Couldn't write '0' to 'vfs/zfs/l2arc_noprefetch', ignoring: No such file or directory
log/syslog:14578:Jul 8 13:30:48 NAS2 systemd-sysctl[2807]: Couldn't write '10000000' to 'vfs/zfs/l2arc_write_max', ignoring: No such file or directory
log/syslog:14584:Jul 8 13:30:48 NAS2 systemd-sysctl[2807]: Couldn't write '30698774528' to 'vfs/zfs/arc_max', ignoring: No such file or directory
log/syslog:14587:Jul 8 13:30:48 NAS2 systemd-sysctl[2807]: Couldn't write '40000000' to 'vfs/zfs/l2arc_write_boost', ignoring: No such file or directory
Daniel Pizappi July 20, 2023 at 2:07 PM
Thanks for adding the debug file! This is in our queue to review now. An engineering representative will update with any further questions or details in the near future.
CZŁOWIEK July 19, 2023 at 4:32 PM
Ideally the system would grow cache when no VMs are running and, as soon as one is started, cache is cut, so no problems may arise. The buffer of unused RAM would be configurable.
Michelle Johnson July 19, 2023 at 3:32 PM
Thank you for your report, @CZŁOWIEK!
Please use the link in the system-generated message below to attach a system debug file. Link to this ticket after you upload the file and before you click Save.
To generate a debug file on TrueNAS SCALE, log in to the TrueNAS web interface, go to System Settings > Advanced, then click Save Debug and wait for the file to download to your local system.
Hello,
long story shot: when running TrueNAS SCALE with VMs and services I found that ZFS cache grows over to use free RAM after VMs start. However if one want to change VM settings and has to stop it, soon it cannot be started again without problems, because it’s RAM being eaten by ZFS cache (ERROR: Not Enough Memory).
I had a case when VM had problems with powering down and was not responsive for 30+ minutes with GUI errors about RAM.
ZFS cache growth could be limited by zfs_arc_max but using GUI the only result is error “Sysctl 'zfs_arc_max' does not exist in kernel.”
Also, in Sysctl I have entries “Generated by autotune”, after migrations from CORE, that look like cannot be deleted “[Errno 2] No such file or directory: '/proc/sys/kern/ipc/maxsockbuf' “, but they are deleted successfully nevertheless (at least from GUI).
ZFS cache should be configurable via Sysctl