Graphs on Overview page clearing on refresh / Not able to export data with newest version

DonalC · May 12, 2025, 7:23pm

No worker06 is on 25.5.1- only worker01 (not used in the experiment at all) is older (it keeps reverting to 25.2.11). I’ve tried updating it (01) stepwise between each intermediate update multiple times, and each update I get a warning that things were not copied correctly from the leader and to check the logs immediately followed by the update successful message. Whenever I restart this worker or wait a few hours it reverts back to 25.2.11. After my most recent attempt at fixing this by updating manually through terminal access, once I power cycled following “successful” updates, this worker can’t be located by the leader (shows up greyed out in the inventory, but can be ssh-ed into and responds)
Each time the page was reloaded, a few points would populate with an x axis scale that is completely wrong (in retrospect this was just it populating with new points over time) and over time this gradually approached a scale that made more sense (with points slightly before the reload up until the current time post-reload).
It’s been set to “All time” the whole time, as of now I can see data as of ~11:30 this morning when I was testing things, and reflecting that, mqtt_to_db_streaming has been active since then (see below)

sudo systemctl status pioreactor_startup_run@mqtt_to_db_streaming.service
● pioreactor_startup_run@mqtt_to_db_streaming.service - Start up mqtt_to_db_streaming on boot.
     Loaded: loaded (/etc/systemd/system/pioreactor_startup_run@.service; enabled; preset: enabled)
     Active: active (running) since Mon 2025-05-12 10:17:17 EDT; 4h 5min ago
   Main PID: 743 (pio)
      Tasks: 5 (limit: 3977)
        CPU: 1min 14.450s
     CGroup: /system.slice/system-pioreactor_startup_run.slice/pioreactor_startup_run@mqtt_to_db_streaming.service
             └─743 /usr/bin/python3 /usr/local/bin/pio run mqtt_to_db_streaming

May 12 14:22:23 leader pio[743]: 2025-05-12T14:22:23-0400 WARNING [mqtt_to_db_streaming] Encountered error in saving to DB: Object m>
May 12 14:22:23 leader pio[743]: 2025-05-12T14:22:23-0400 DEBUG  [mqtt_to_db_streaming] Error in parse_od. Payload that caused error>
May 12 14:22:23 leader pio[743]: Traceback (most recent call last):
May 12 14:22:23 leader pio[743]:   File "/usr/local/lib/python3.11/dist-packages/pioreactor/background_jobs/leader/mqtt_to_db_stream>
May 12 14:22:23 leader pio[743]:     new_rows = parser(message.topic, message.payload)
May 12 14:22:23 leader pio[743]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
May 12 14:22:23 leader pio[743]:   File "/usr/local/lib/python3.11/dist-packages/pioreactor/background_jobs/leader/mqtt_to_db_stream>
May 12 14:22:23 leader pio[743]:     od_reading = msgspec_loads(payload, type=structs.ODReading)
May 12 14:22:23 leader pio[743]:                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
May 12 14:22:23 leader pio[743]: msgspec.ValidationError: Object missing required field `calibrated`
lines 1-19/19 (END)

Another note is that lighttpd.service was already present in pioreactor_startup_run@.service. Even though the steps you’ve mentioned appear to have fixed the issue, I’d like to try to figure out more about what caused it/how to prevent it for the future, as it appeared to have first failed 1.5 seconds into when I started my automations (without a way for me to know, since data was populating over time and mqtt_to_db_screening and monitor were listed as on in the leader inventory page)

CamDavidsonPilon · May 12, 2025, 7:48pm

The reason I am looking at worker06 is because that is producing data that isn’t compatible anymore. It’s firing data that is missing the “calibrated” field. Like, is it part of an experiment (Pioreactor Aeration (20 mL reactor)) that is old, and have been running for a long time? If so, you’ll need to restart it’s od_reading job.

What I think happened was that mqtt_to_db_streaming failed to save data¹, and no data was being saved to the database =>

exports were empty,
Overview charts were populating incorrectly (as you mention, this was just new live data points)

When you restarted mqtt_to_db_streaming (which I think you did, and you can ignore the “mqtt_to_db_streaming is already running” for now), new mqtt_to_db_streaming code was loaded, which rejected data without the “calibration” field =>

that worker06 error.
Data being saved properly again .

it appeared to have first failed 1.5 seconds into when I started my automations

Sorry, I’m not sure what you’re referring to here that failed.

¹ I’m not yet sure how this happened, still investigating. But it’s possible that it was running old code, and rejected data with the “calibrated” field.

DonalC · May 12, 2025, 9:47pm

No, Pioreactor Aeration (20 mL reactor) was started on Saturday as well (I just separated things into 2 experiments so I could easily see replicates per condition grouped). I had used these reactors to run experiments a few months prior, but no jobs were currently active, so unless something was not sufficiently killed when the experiment was ended that shouldn’t have been the case. These reactors (including worker06) were also power cycled a few times since the experiment a few months ago was ended, so I would think that would also further support there not being any lingering active jobs. I also double-checked, and no other experiments have any profiles running. I cancelled the od_reading job on worker06, but when I tried restarting it, the button in the UI kept circling without actually starting (nor could I get this to work by ssh-ing into worker06, see below):

2025-05-12T16:47:18-0400 [od_reading] DEBUG Init.
2025-05-12T16:47:18-0400 [od_reading] DEBUG Using ADC class Pico_ADC.
2025-05-12T16:47:18-0400 [od_reading] DEBUG ADC ready to read from PD channels 2, 1, with gain 1.
2025-05-12T16:47:20-0400 [od_reading] DEBUG AC hz estimate: 50.0
2025-05-12T16:47:21-0400 [od_reading] DEBUG ADC offsets: {'2': 1087.5408408089486, '1': 1084.4120570164077}, and in voltage: {'2': 0.05477540864880235, '1': 0.054617823384526025}
2025-05-12T16:47:21-0400 [od_reading] DEBUG Starting od_reading with PD channels {'2': '90'}, with IR LED intensity 70.0% from channel A, every 5.0 seconds
2025-05-12T16:47:21-0400 [od_reading] INFO Ready.
2025-05-12T16:47:21-0400 [od_reading] DEBUG od_reading is blocking until disconnected.
2025-05-12T16:47:22-0400 [growth_rate_calculating] DEBUG Late arriving data: timestamp=datetime.datetime(2025, 5, 12, 20, 47, 21, 302855, tzinfo=datetime.timezone.utc), self.time_of_previous_observation=datetime.datetime(2025, 5, 12, 20, 50, 34, 608744, tzinfo=datetime.timezone.utc)
2025-05-12T16:47:22-0400 [growth_rate_calculating] DEBUG Updating Kalman Filter failed with Late arriving data: {timestamp=}, {self.time_of_previous_observation=}
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/pioreactor/background_jobs/growth_rate_calculating.py", line 419, in update_state_from_observation
    ) = self._update_state_from_observation(od_readings)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/pioreactor/background_jobs/growth_rate_calculating.py", line 457, in _update_state_from_observation
    raise ValueError("Late arriving data: {timestamp=}, {self.time_of_previous_observation=}")
ValueError: Late arriving data: {timestamp=}, {self.time_of_previous_observation=}
2025-05-12T16:47:27-0400 [growth_rate_calculating] DEBUG Late arriving data: timestamp=datetime.datetime(2025, 5, 12, 20, 47, 26, 292019, tzinfo=datetime.timezone.utc), self.time_of_previous_observation=datetime.datetime(2025, 5, 12, 20, 50, 34, 608744, tzinfo=datetime.timezone.utc)
2025-05-12T16:47:27-0400 [growth_rate_calculating] DEBUG Updating Kalman Filter failed with Late arriving data: {timestamp=}, {self.time_of_previous_observation=}
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/pioreactor/background_jobs/growth_rate_calculating.py", line 419, in update_state_from_observation
    ) = self._update_state_from_observation(od_readings)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/pioreactor/background_jobs/growth_rate_calculating.py", line 457, in _update_state_from_observation
    raise ValueError("Late arriving data: {timestamp=}, {self.time_of_previous_observation=}")
ValueError: Late arriving data: {timestamp=}, {self.time_of_previous_observation=}
2025-05-12T16:47:32-0400 [growth_rate_calculating] DEBUG Late arriving data: timestamp=datetime.datetime(2025, 5, 12, 20, 47, 31, 290443, tzinfo=datetime.timezone.utc), self.time_of_previous_observation=datetime.datetime(2025, 5, 12, 20, 50, 34, 608744, tzinfo=datetime.timezone.utc)
2025-05-12T16:47:32-0400 [growth_rate_calculating] DEBUG Updating Kalman Filter failed with Late arriving data: {timestamp=}, {self.time_of_previous_observation=}
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/pioreactor/background_jobs/growth_rate_calculating.py", line 419, in update_state_from_observation
    ) = self._update_state_from_observation(od_readings)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/pioreactor/background_jobs/growth_rate_calculating.py", line 457, in _update_state_from_observation
    raise ValueError("Late arriving data: {timestamp=}, {self.time_of_previous_observation=}")
ValueError: Late arriving data: {timestamp=}, {self.time_of_previous_observation=}
2025-05-12T16:47:34-0400 [od_reading] DEBUG Exiting caused by signal Interrupt.
2025-05-12T16:47:34-0400 [od_reading] INFO Disconnected.
2025-05-12T16:47:34-0400 [growth_rate_calculating] DEBUG Decode error in `` to structs.ODReadings
2025-05-12T16:47:35-0400 [od_reading] DEBUG Disconnected successfully from MQTT.
2025-05-12T16:48:58-0400 [temperature_automation] DEBUG features={'previous_heater_dc': 37.49, 'room_temp': 22.0, 'is_rpi_zero': False, 'volume': 14.0, 'time_series_of_temp': [40.9375, 39.625, 38.625, 37.760416666666664, 37.041666666666664, 36.375, 35.8125, 35.291666666666664, 34.8125, 34.385416666666664, 34.041666666666664, 33.6875, 33.375, 33.0625, 32.8125, 32.5625, 32.3125, 32.125, 31.927083333333332, 31.739583333333332, 31.5625]}
2025-05-12T16:48:58-0400 [temperature_automation] DEBUG PID output = -1.2099999999999964

The first 1.5 seconds thing was just that when I first started running automations/started my experiment mqtt_to_db_streaming failed pretty much immediately based on timing of the first error messages I got (i.e. it wasn’t something that happened due to something during the course of jobs running), sorry that wasn’t clear.

Do you think now that mqtt_to_db_streaming is restarted it’s “safe” for me to start new experiments (as long as I excluded worker06 for now), or should I hold off until we can get to the bottom of this more? Thank you so much for your help and let me know if there’s any further details I can provide!

CamDavidsonPilon · May 13, 2025, 12:03am

And does worker06 say it’s on 25.5.1 on the inventory page, even after a restart? My other guess is that there is something weird going on with where the pioreactor software is installed, and there might be two software versions installed. This might explain what’s going on with worker01 too. As a test, can you run:

python -c "import importlib.util, pkg_resources, sys; m='pioreactor'; spec=importlib.util.find_spec(m); print(spec.origin); print(pkg_resources.get_distribution(m))"

after sshing into worker06 and worker01? And, as a control, run the same on a different worker, too.

DonalC · May 13, 2025, 3:53pm

When I first restarted, worker06 was on 25.5.1, but after about a minute it reverted to 25.2.20, so I think you might be right that something weird is going on there.

I ran that line you put on worker01 and worker02 (control) and got back from both of them:

/usr/local/lib/python3.11/dist-packages/pioreactor/__init__.py
pioreactor 25.5.1

worker06 was the same except it had 25.2.20 instead. This seems bizarre to me as worker01 was the one that was originally reverting (but to an older build than 06 is now), but now it is saying its on 25.5.1. However, I think there’s something more going on with worker01 as it still remains completely greyed out in the inventory and when I try pio blink it doesn’t flash (and the blue LED doesn’t illuminate when I physically press the button either)

CamDavidsonPilon · May 13, 2025, 5:02pm

For the workers that are experiencing this “phantom” versioning (worker06), try the following:

pip uninstall pioreactor -y

And then try the test command again:

python -c "import importlib.util, pkg_resources, sys; m='pioreactor'; spec=importlib.util.find_spec(m); print(spec.origin); print(pkg_resources.get_distribution(m))"

I think you’ll get 25.5.1 now¹. Maybe a good idea to restart it too with sudo reboot now

If worker01 isn’t responding to pio blink and is greyed , then something is wrong with the monitor job. Try the following:

sudo systemctl stop pioreactor_startup_run@monitor.service

and then run:

pio run monitor

Any errors? You can crtl-c to stop that job, and

sudo systemctl start pioreactor_startup_run@monitor.service

to start it in the background again.

¹I don’t have a clear reason why there might be two versions are installed. Our software installs the pioreactor python library system-wide, using sudo pip install pioreactor, but if someone uses pip install pioreactor (no sudo), that will install for the current user only, and be the preferred version when pio is invoked.

DonalC · May 13, 2025, 7:06pm

When I tried just pip uninstall..., I got an exception PermissionError: [Errno 13] Permission denied: '/usr/local/bin/pio', so I tried with sudo and that worked, but then I got:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'origin'

So there doesn’t seem to be a double install, which is even weirder. Following a reboot, there still is NoneType for the version and no pio based commands work (because the only version was uninstalled).

For the worker01 issue, all the commands ran without issue (either in terminal or in the logs), but did not solve the problem (it still is behaving the same).

Is it still helpful for your team to try to diagnose the issue here to prevent future bugs? If not, it’s probably faster for me to just pull my calibrations off and reflash these two from fresh installs

CamDavidsonPilon · May 13, 2025, 10:32pm

Yea, do what’s best for you and reflash. I don’t think there is a simple way out of this unfortunately. I think I have some ideas on preventative solutions from our discussion, too.