Bug - Monitor disconnected due to signal Terminated

When I click “Stop All Activity” on the *.local/pioreactors, it is sometimes killing the monitor job. When the “Monitor” job becomes disconnected, I’m no longer able to control the unit in the /pioreactors page, and the status of the unit appears as gray and offline (as opposed to green and online).

I think this is caused by/related to the monitor job getting disconnected, as I’ve noticed that message seems to correlate with the unit that isn’t responding to commands. This usually happens when I click “Stop All Activity,” and I’ve noticed it happen to pioreactor1 twice and worker1 and leader once (simultaneously).

My current fix to this issue is rebooting the unit, and that seems to restart the monitor service and get it working properly.

2023-04-12T16:38:36-0700 [CLI] DEBUG Executing pio kill --all-jobs on pioreactor1.
2023-04-12T16:38:36-0700 [CLI] DEBUG Executing pio kill --all-jobs on worker1.
2023-04-12T16:38:36-0700 [CLI] DEBUG Executing pio kill --all-jobs on leader.
2023-04-12T16:38:38-0700 [monitor] DEBUG Exiting caused by signal Terminated.
2023-04-12T16:38:38-0700 [add_alt_media] INFO Stopped alt_media pump.
2023-04-12T16:38:38-0700 [add_media] INFO Stopped media pump.
2023-04-12T16:38:38-0700 [remove_waste] INFO Stopped waste pump.
2023-04-12T16:38:38-0700 [monitor] INFO Disconnected.
2023-04-12T16:38:38-0700 [monitor] DEBUG Disconnected successfully from MQTT.
2023-04-12T16:38:38-0700 [PWM@GPIO-13] DEBUG Cleaned up GPIO-13.
2023-04-12T16:38:38-0700 [PWM@GPIO-16] DEBUG Cleaned up GPIO-16.
2023-04-12T16:38:38-0700 [PWM@GPIO-17] DEBUG Cleaned up GPIO-17.

It sometimes kills the monitor job?

Can you tell me what version of Pioreactor version you are using, specifically on the Pioreactor that is having the problem pio version -v? This was a bug a while ago, but it should have been resolved in version 23.3.2.

I updated the software yesterday, and I think I updated it prior to the issue occurring (but not 100% positive). I’ll let you know if I experience this issue again. Is there some way to restart the monitor service without restarting the units?

pioreactor@pioreactor1:~ $ pio version -v
Pioreactor software: 23.4.4
Pioreactor HAT: 1.0
Pioreactor firmware: 0.0
Operating system: Linux-5.15.61-v7±armv7l-with-glibc2.31
Raspberry Pi: Raspberry Pi 3 Model B Plus Rev 1.3
Image version: 09ff029fd65f76edfc4e1634f97b98a2838e8be7
Pioreactor UI: 23.3.20

Not from the UI, but from the command line:

sudo systemctl restart pioreactor_startup_run@monitor.service
1 Like

Thanks. Now I am getting a different error. Worker1 appears to be losing connection, and its status is shown as “Lost,” but I am still able to start jobs (e.g., stirring/temp/dosing, etc…) and I am also able to see the information show up in the UI. I have tried restarting Worker1 by unplugging it, but it still says it is lost. I only tried it one time, but it still said it is lost. I tried checking the logs (on both the leader and worker), but didn’t see anything relevant to this except for a line or two about worker1 being lost. Other than showing up as “Lost,” worker1 appears to be working as normal.

This behavior first started overnight (at midnight) but there were no other errors that occurred around the same time. There doesn’t appear to be any indication of physical damage or overflow, and I wasn’t running anything special at the time.

Edit: Here is the version info for worker1.
Pioreactor software: 23.4.4
Pioreactor HAT: 1.1
Pioreactor firmware: 0.2
Operating system: Linux-5.15.61-v7±armv7l-with-glibc2.31
Raspberry Pi: Raspberry Pi 3 Model A Plus Rev 1.0
Image version: cedf98609dae18a3ee1b1c0b6899660d782096ec

This is probably due to my crappy “watchdog” code that I want to rewrite¹. I don’t think anything is actually wrong with your Pioreactor except for that label in the UI.

Sorry for the confusion!

¹ The code / logic is suppose to confirm if a worker is really lost and will force a worker to appear lost if certain conditions are met, but it’s flakey and poorly implemented.