python3 error and unschedule reboot
Description
Problem/Justification
Impact
duplicates
Activity

Alexander Motin November 1, 2023 at 1:55 PM
I can not proceed without some definite vector. Since the target system was migrated to SCALE and is OK there it does not sound like we get any more information. Closing the ticket.

Alexander Motin October 11, 2023 at 6:29 PM
You’ve practically replaced all the software aside of ZFS. So I am not exactly surprised that it may solve the issue, if we accept that the memory corruption is caused by software. Since you now mentioned you are using NFS, it may be some NFS-specific issue, but that is only a small step in the diagnostic process. We generally recommend using iSCSI for VM storage, not NFS, it is usually faster, though NFS has some administrative benefits in some cases. You could try to configure that on Core and see what happen. If that work reliably, then the problem may be in NFS code. But once we know it is NFS, we would need some sort of problem reproduction, that still may be difficult.

Leonardo Buschiazzo October 10, 2023 at 6:07 PM
Hello Alejandro, as I mentioned in the previous update, the server in question has been working for more than 15 days without problems, the NFS resource mounted on the ESXi server continues without errors, the disk scrubs are completed without problems, there are no python errors , no unexpected restarts, no excessive memory consumption by pythone like before. The Win10 VM installed on NFS from the ESXi works without problems. Perhaps the problem is due to an incompatibility of some server component in the BSD-based system? If you need any additional information, do not hesitate to request it. Thanks in advance.

Leonardo Buschiazzo September 30, 2023 at 8:51 PM
Hi Alexander, I understand, don't worry. I hope you can find something. I must mention a test that I have carried out, using the same components in its entirety of the server in question, I installed the scale version based on Linux, the pools were created again and the data was deleted, an NFS share was created again,the resource was mounted
on an esxi server, a win10 pro was installed for testing. So far, after almost 4 days of work, the computer has not restarted unexpectedly, there are no python errors, everything works without problems, I will wait a week of use and I will update you. This did not work like that with the bsd-based core version. Thank you

Alexander Motin September 28, 2023 at 2:33 PM
I’m sorry, Leonardo, but your case is difficult. Memory corruptions always are. What we see in debug are consequences of the problem, not the cause, and I don’t have even direction to start searching. I am trying to look for other reports that may be shared, but it does not give any time lines, sorry.
Details
Details
Assignee

Reporter

Hello Alexander, we have followed your advice, we have updated the Bios, cleanup, 3 days of memory check with memtes86 with a round robin type of test without errors, we have uploaded a new system debug, as I mentioned in the last message we have the notification of python3 error, we have not yet deleted the .core files as recommended, we wait to upload the dump and receive your comments on how to continue with this case. From already thank you very much