12-BETA2 middleware still having disconnect/dropping/"blocking" issues - interesting log entries
Description
Problem/Justification
None
Impact
None
SmartDraw Connector
Katalon Manual Tests (BETA)
Activity
Show:

William Gryzbowski September 8, 2020 at 12:01 PM
BETA2.1 doesnt have these fixes.

Stilez September 6, 2020 at 2:17 AMEdited
See NAS-107473, middleware disconnects persist in BETA 2.1 after apparent fixes (including I suspect the RC1 fixes so far), but now seem to be on a precise 5 minute cycle, with a consistent error message??
But also a bunch of other errors too?

Stilez August 26, 2020 at 10:29 AM
See also: Pathological scrub behaviour now collated and issue created:
(may be relevant?)

William Gryzbowski August 25, 2020 at 4:55 PM
I will see if we can enable 12.1 nightlies enabled based on 12.0 code.

William Gryzbowski August 25, 2020 at 4:54 PM
Probably not as simple as I would like.
When this happens a few things could be grabbed manually:
midclt call core.threads_stacks
midclt call core.get_tasks
midclt call core.get_jobs
cat /var/log/middlewared.log
Duplicate
Pinned fields
Click on the next to a field label to start pinning.
Details
Details
Assignee

Reporter

Impact
High
Components
Fix versions
Affects versions
Priority
More fields
Time tracking
More fields
Time trackingKatalon Platform
Linked Test Cases, Katalon Defect Results, Katalon Studio Test Results
Katalon Platform
Linked Test Cases, Katalon Defect Results, Katalon Studio Test Results
Created August 24, 2020 at 2:44 PM
Updated July 1, 2022 at 4:55 PM
Resolved August 24, 2020 at 2:49 PM
Tried to log in to server, it took something like 3 minutes. The server is idle, no scrubs, no traffic, * nothing * going on.
So I looked at the middleware log, because client-server disconnects in middleware have been a thing and aren't yet fixed in BETA2. I saw interesting lines like these, in the recent middleware log. Worth a look.....
Debug attached (or will be!)
I am hoping that buried somewhere in this, is also the answer why middleware is being so flakey with connections and responsiveness on 12-BETA2.
SAMPLE OF middlewared.log:
This is a bit long for a sample, but there are multiple different kinds of serious-looking error in the log. The snip below shows:
40,000 jobs waiting and blocked, don't know what those might refer to or if its normal. Probably not normal, since there's also...
Multiple exceptions related to apparently blocked tasks that seem to relate to task management
Inability or failure to hook something middleware core
Monitor/subprocess thread blocked
Socket timeouts douyble exceptions
Crash report mentions
Reference in last line to "sys.exit(main())"
LOG SNIP:
[2020/08/24 15:00:09] (WARNING) middlewared.job.add():202 - There are 42266 jobs waiting or running
[2020/08/24 15:00:26] (WARNING) middlewared.job.add():202 - There are 42266 jobs waiting or running
[2020/08/24 15:00:43] (WARNING) middlewared.job.add():202 - There are 42266 jobs waiting or running
[2020/08/24 15:01:08] (WARNING) middlewared._loop_monitor_thread():1404 - Task seems blocked:
File "/usr/local/bin/middlewared", line 10, in <module>
sys.exit(main())
File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1577, in main
Middleware(
File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1421, in run
self.loop.run_forever()
File "/usr/local/lib/python3.8/asyncio/base_events.py", line 570, in run_forever
self._run_once()
File "/usr/local/lib/python3.8/asyncio/base_events.py", line 1859, in _run_once
handle._run()
File "/usr/local/lib/python3.8/asyncio/events.py", line 81, in _run
self._context.run(self._callback, *self._args)
File "/usr/local/lib/python3.8/site-packages/middlewared/job.py", line 169, in run
job = await self.next()
File "/usr/local/lib/python3.8/site-packages/middlewared/job.py", line 151, in next
if lock is None or not lock.locked():
File "/usr/local/lib/python3.8/site-packages/middlewared/job.py", line 50, in locked
def locked(self):
[2020/08/24 15:01:09] (WARNING) middlewared.job.add():202 - There are 42266 jobs waiting or running
[2020/08/24 15:01:58] (WARNING) middlewared.job.add():202 - There are 42266 jobs waiting or running
[2020/08/24 15:03:19] (WARNING) middlewared.job.add():202 - There are 42267 jobs waiting or running
[2020/08/24 15:03:32] (WARNING) middlewared._loop_monitor_thread():1404 - Task seems blocked:
File "/usr/local/bin/middlewared", line 10, in <module>
sys.exit(main())
File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1577, in main
Middleware(
File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1421, in run
self.loop.run_forever()
File "/usr/local/lib/python3.8/asyncio/base_events.py", line 570, in run_forever
self._run_once()
File "/usr/local/lib/python3.8/asyncio/base_events.py", line 1859, in _run_once
handle._run()
File "/usr/local/lib/python3.8/asyncio/events.py", line 81, in _run
self._context.run(self._callback, *self._args)
File "/usr/local/lib/python3.8/site-packages/middlewared/job.py", line 169, in run
job = await self.next()
File "/usr/local/lib/python3.8/site-packages/middlewared/job.py", line 151, in next
if lock is None or not lock.locked():
[2020/08/24 15:03:34] (WARNING) middlewared.job.add():202 - There are 42267 jobs waiting or running
[2020/08/24 15:03:51] (WARNING) middlewared.job.add():202 - There are 42267 jobs waiting or running
[2020/08/24 15:04:11] (WARNING) middlewared.job.add():202 - There are 42267 jobs waiting or running
[2020/08/24 15:04:48] (WARNING) middlewared.job.add():202 - There are 42267 jobs waiting or running
[2020/08/24 15:05:01] (WARNING) middlewared.job.add():202 - There are 42267 jobs waiting or running
[2020/08/24 15:05:39] (WARNING) middlewared.job.add():202 - There are 42267 jobs waiting or running
[2020/08/24 15:05:52] (WARNING) middlewared.job.add():202 - There are 42267 jobs waiting or running
[2020/08/24 15:06:05] (WARNING) middlewared.job.add():202 - There are 42267 jobs waiting or running
[2020/08/24 15:06:18] (WARNING) middlewared.job.add():202 - There are 42267 jobs waiting or running
[2020/08/24 15:06:38] (WARNING) middlewared.job.add():202 - There are 42267 jobs waiting or running
[2020/08/24 15:06:54] (WARNING) middlewared.job.add():202 - There are 42267 jobs waiting or running
[2020/08/24 15:07:13] (WARNING) middlewared.job.add():202 - There are 42267 jobs waiting or running
[2020/08/24 15:07:33] (ERROR) middlewared.call_hook():1078 - Failed to run hook core.on_connect:<function ha_permission at 0x81cb1ae50>(*(), **{'app': <middlewared.main.Application object at 0x8f 106676 78306a0>})
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1074, in call_hook
await fut
File "/usr/local/lib/middlewared_truenas/plugins/failover.py", line 1217, in ha_permission
sock = app.request.transport.get_extra_info('socket')
AttributeError: 'NoneType' object has no attribute 'get_extra_info'
[2020/08/24 15:07:33] (WARNING) middlewared.job.add():202 - There are 42267 jobs waiting or running
[2020/08/24 15:07:55] (WARNING) middlewared.job.add():202 - There are 42267 jobs waiting or running
[2020/08/24 15:08:14] (WARNING) middlewared.job.add():202 - There are 42267 jobs waiting or running
[2020/08/24 15:08:31] (WARNING) middlewared.job.add():202 - There are 42267 jobs waiting or running
[2020/08/24 15:08:44] (WARNING) middlewared.job.add():202 - There are 42267 jobs waiting or running
[2020/08/24 15:09:04] (WARNING) middlewared.job.add():202 - There are 42267 jobs waiting or running
[2020/08/24 15:09:18] (WARNING) middlewared.job.add():202 - There are 42267 jobs waiting or running
[2020/08/24 15:09:31] (WARNING) middlewared.job.add():202 - There are 42267 jobs waiting or running
[2020/08/24 15:09:44] (WARNING) middlewared.job.add():202 - There are 42267 jobs waiting or running
[2020/08/24 15:10:04] (WARNING) middlewared.job.add():202 - There are 42267 jobs waiting or running
[2020/08/24 15:10:16] (WARNING) middlewared._loop_monitor_thread():1404 - Task seems blocked:
File "/usr/local/bin/middlewared", line 10, in <module>
sys.exit(main())
File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1577, in main
Middleware(
File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1421, in run
self.loop.run_forever()
File "/usr/local/lib/python3.8/asyncio/base_events.py", line 570, in run_forever
self._run_once()
File "/usr/local/lib/python3.8/asyncio/base_events.py", line 1859, in _run_once
handle._run()
File "/usr/local/lib/python3.8/asyncio/events.py", line 81, in _run
self._context.run(self._callback, *self._args)
File "/usr/local/lib/python3.8/site-packages/middlewared/job.py", line 169, in run
job = await self.next()
File "/usr/local/lib/python3.8/site-packages/middlewared/job.py", line 146, in next
lock = self.get_lock(job)
File "/usr/local/lib/python3.8/site-packages/middlewared/job.py", line 114, in get_lock
lock = self.job_locks.get(name)
[2020/08/24 15:10:17] (WARNING) middlewared.job.add():202 - There are 42267 jobs waiting or running
[2020/08/24 15:10:54] (WARNING) middlewared.job.add():202 - There are 42267 jobs waiting or running
[2020/08/24 15:11:31] (WARNING) middlewared.job.add():202 - There are 42267 jobs waiting or running
[2020/08/24 15:11:56] (WARNING) middlewared.job.add():202 - There are 42267 jobs waiting or running
[2020/08/24 15:11:56] (WARNING) middlewared._loop_monitor_thread():1404 - Task seems blocked:
File "/usr/local/bin/middlewared", line 10, in <module>
sys.exit(main())
File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1577, in main
Middleware(
File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1421, in run
self.loop.run_forever()
File "/usr/local/lib/python3.8/asyncio/base_events.py", line 570, in run_forever
self._run_once()
File "/usr/local/lib/python3.8/asyncio/base_events.py", line 1859, in _run_once
handle._run()
File "/usr/local/lib/python3.8/asyncio/events.py", line 81, in _run
self._context.run(self._callback, *self._args)
File "/usr/local/lib/python3.8/site-packages/middlewared/job.py", line 169, in run
job = await self.next()
File "/usr/local/lib/python3.8/site-packages/middlewared/job.py", line 146, in next
lock = self.get_lock(job)
File "/usr/local/lib/python3.8/site-packages/middlewared/job.py", line 118, in get_lock
lock.add_job(job)
File "/usr/local/lib/python3.8/site-packages/middlewared/job.py", line 41, in add_job
def add_job(self, job):
[2020/08/24 15:12:12] (WARNING) middlewared._loop_monitor_thread():1404 - Task seems blocked:
File "/usr/local/bin/middlewared", line 10, in <module>
sys.exit(main())
File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1577, in main
Middleware(
File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1421, in run
self.loop.run_forever()
File "/usr/local/lib/python3.8/asyncio/base_events.py", line 570, in run_forever
self._run_once()
File "/usr/local/lib/python3.8/asyncio/base_events.py", line 1859, in _run_once
handle._run()
File "/usr/local/lib/python3.8/asyncio/events.py", line 81, in _run
self._context.run(self._callback, *self._args)
File "/usr/local/lib/python3.8/site-packages/middlewared/job.py", line 169, in run
job = await self.next()
File "/usr/local/lib/python3.8/site-packages/middlewared/job.py", line 151, in next
if lock is None or not lock.locked():
[2020/08/24 15:12:12] (WARNING) middlewared.job.add():202 - There are 42267 jobs waiting or running
(some lines skipped)
[2020/08/24 15:25:37] (WARNING) middlewared.job.add():202 - There are 42267 jobs waiting or running
[2020/08/24 15:25:57] (WARNING) middlewared._loop_monitor_thread():1404 - Task seems blocked:
File "/usr/local/bin/middlewared", line 10, in <module>
sys.exit(main())
File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1577, in main
Middleware(
File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1421, in run
self.loop.run_forever()
File "/usr/local/lib/python3.8/asyncio/base_events.py", line 570, in run_forever
self._run_once()
File "/usr/local/lib/python3.8/asyncio/base_events.py", line 1859, in _run_once
handle._run()
File "/usr/local/lib/python3.8/asyncio/events.py", line 81, in _run
self._context.run(self._callback, *self._args)
File "/usr/local/lib/python3.8/site-packages/middlewared/job.py", line 169, in run
job = await self.next()
File "/usr/local/lib/python3.8/site-packages/middlewared/job.py", line 144, in next
for job in self.queue:
[2020/08/24 15:25:57] (WARNING) middlewared.job.add():202 - There are 42267 jobs waiting or running
[2020/08/24 15:26:13] (WARNING) middlewared.job.add():202 - There are 42267 jobs waiting or running
[2020/08/24 15:26:30] (WARNING) middlewared.job.add():202 - There are 42267 jobs waiting or running
[2020/08/24 15:27:08] (WARNING) middlewared.job.add():202 - There are 42267 jobs waiting or running
[2020/08/24 15:27:23] (WARNING) middlewared._loop_monitor_thread():1404 - Task seems blocked:
File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1191, in _call
return await methodobj(*prepared_call.args)
File "/usr/local/lib/python3.8/site-packages/middlewared/schema.py", line 973, in nf
return await f(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/middlewared/plugins/system.py", line 533, in info
uptime = (await (await Popen(
File "/usr/local/lib/python3.8/asyncio/subprocess.py", line 236, in create_subprocess_exec
transport, protocol = await loop.subprocess_exec(
File "/usr/local/lib/python3.8/asyncio/base_events.py", line 1630, in subprocess_exec
transport = await self._make_subprocess_transport(
File "/usr/local/lib/python3.8/asyncio/unix_events.py", line 197, in _make_subprocess_transport
transp = _UnixSubprocessTransport(self, protocol, args, shell,
File "/usr/local/lib/python3.8/asyncio/base_subprocess.py", line 36, in _init_
self._start(args=args, shell=shell, stdin=stdin, stdout=stdout,
File "/usr/local/lib/python3.8/asyncio/unix_events.py", line 789, in _start
self._proc = subprocess.Popen(
File "/usr/local/lib/python3.8/subprocess.py", line 854, in _init_
self._execute_child(args, executable, preexec_fn, close_fds,
File "/usr/local/lib/python3.8/subprocess.py", line 1658, in _execute_child
part = os.read(errpipe_read, 50000)
[2020/08/24 15:27:25] (WARNING) middlewared.job.add():202 - There are 42267 jobs waiting or running
[2020/08/24 15:27:42] (WARNING) middlewared.job.add():202 - There are 42267 jobs waiting or running
[2020/08/24 15:28:03] (WARNING) middlewared.job.add():202 - There are 42267 jobs waiting or running
[2020/08/24 15:28:05] (WARNING) application.call_method():173 - Exception while calling pool.dataset.query(*[])
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/local/lib/python3.8/concurrent/futures/process.py", line 239, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/usr/local/lib/python3.8/site-packages/middlewared/worker.py", line 91, in main_worker
res = MIDDLEWARE._run(*call_args)
File "/usr/local/lib/python3.8/site-packages/middlewared/worker.py", line 45, in _run
return self._call(name, serviceobj, methodobj, args, job=job)
File "/usr/local/lib/python3.8/site-packages/middlewared/worker.py", line 33, in _call
with Client('ws+unix:///var/run/middlewared-internal.sock', py_exceptions=True) as c:
File "/usr/local/lib/python3.8/site-packages/middlewared/client/client.py", line 281, in _init_
self._ws.connect()
File "/usr/local/lib/python3.8/site-packages/middlewared/client/client.py", line 124, in connect
rv = super(WSClient, self).connect()
File "/usr/local/lib/python3.8/site-packages/ws4py/client/_init_.py", line 223, in connect
bytes = self.sock.recv(128)
socket.timeout: timed out
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 137, in call_method
result = await self.middleware._call(message['method'], serviceobj, methodobj, params, app=self,
File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1202, in _call
return await self.run_in_executor(prepared_call.executor, methodobj, *prepared_call.args)
File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1106, in run_in_executor
return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.8/site-packages/middlewared/schema.py", line 977, in nf
return f(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/middlewared/plugins/pool.py", line 2621, in query
self.__transform(self.middleware.call_sync(
File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1253, in call_sync
return self.run_coroutine(self._call_worker(name, *prepared_call.args))
File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1289, in run_coroutine
return fut.result()
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
return self.__get_result()
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
raise self._exception
File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1205, in _call_worker
return await self.run_in_proc(main_worker, name, args, job)
File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1132, in run_in_proc
return await self.run_in_executor(self.__procpool, method, *args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1106, in run_in_executor
return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
socket.timeout: timed out
[2020/08/24 15:28:25] (DEBUG) middlewared.logger.CrashReporting.report():108 - Sending a crash report...
[2020/08/24 15:28:25] (WARNING) middlewared.job.add():202 - There are 42267 jobs waiting or running
[2020/08/24 15:28:43] (DEBUG) urllib3.connectionpool._new_conn():955 - Starting new HTTPS connection (1): sentry.ixsystems.com:443
[2020/08/24 15:28:43] (WARNING) middlewared.job.add():202 - There are 42267 jobs waiting or running
[2020/08/24 15:29:19] (WARNING) middlewared.job.add():202 - There are 42267 jobs waiting or running
[2020/08/24 15:29:20] (DEBUG) urllib3.connectionpool._make_request():428 - https://sentry.ixsystems.com:443 "POST /api/2/store/ HTTP/1.1" 200 41
[2020/08/24 15:29:40] (WARNING) middlewared.job.add():202 - There are 42267 jobs waiting or running
[2020/08/24 15:29:40] (WARNING) middlewared._loop_monitor_thread():1404 - Task seems blocked:
File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1233, in call
return await self._call(
File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1191, in _call
return await methodobj(*prepared_call.args)
File "/usr/local/lib/python3.8/site-packages/middlewared/schema.py", line 973, in nf
return await f(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/middlewared/plugins/datastore/read.py", line 163, in query
result = await self._queryset_serialize(
File "/usr/local/lib/python3.8/site-packages/middlewared/plugins/datastore/read.py", line 213, in _queryset_serialize
result.append(await self._serialize(
File "/usr/local/lib/python3.8/site-packages/middlewared/plugins/datastore/read.py", line 222, in _serialize
data = self._serialize_row(obj, table, aliases)
File "/usr/local/lib/python3.8/site-packages/middlewared/plugins/datastore/read.py", line 247, in _serialize_row
data[column.name] = obj[column]
File "/usr/local/lib/python3.8/site-packages/middlewared/sqlalchemy.py", line 49, in _result_processor
return decrypt(value) if value else ''
File "/usr/local/lib/python3.8/site-packages/middlewared/plugins/pwenc.py", line 93, in decrypt
cipher = AES.new(PWEncService.get_secret(), AES.MODE_CTR, counter=Counter.new(64, prefix=nonce))
File "/usr/local/lib/python3.8/site-packages/middlewared/plugins/pwenc.py", line 59, in get_secret
with open(PWENC_FILE_SECRET, 'rb') as f:
[2020/08/24 15:29:59] (WARNING) middlewared.job.add():202 - There are 42267 jobs waiting or running
[2020/08/24 15:31:01] (WARNING) middlewared.job.add():202 - There are 42267 jobs waiting or running
[2020/08/24 15:31:20] (WARNING) middlewared.job.add():202 - There are 42267 jobs waiting or running
[2020/08/24 15:31:40] (WARNING) middlewared.job.add():202 - There are 42267 jobs waiting or running
[2020/08/24 15:31:55] (WARNING) middlewared.job.add():202 - There are 42267 jobs waiting or running
[2020/08/24 15:32:13] (WARNING) middlewared.job.add():202 - There are 42267 jobs waiting or running
[2020/08/24 15:32:32] (WARNING) middlewared.job.add():202 - There are 42268 jobs waiting or running
[2020/08/24 15:32:45] (WARNING) middlewared._loop_monitor_thread():1404 - Task seems blocked:
File "/usr/local/bin/middlewared", line 10, in <module>
sys.exit(main())