Graphs on Overview page clearing on refresh / Not able to export data with newest version

Hey,

I installed the 24.12.10 update for app and UI on my Pios last week and noticed that when refreshing the overview page all the graphs are cleared and are starting fresh from that timepoint. In addition trying to export data from a finished experiment results in a “Page not found” error (URL points to the export zip file).

The growth rate chart starts plotting growth rates with every new OD reading after refreshing the overview page so I guess it is more a UI bug than a data storage problem, but idk.

Thanks for any help!

Best, Kai

We are having the same issue, can you also not export any data anymore? I am worried none of the data is being saved on the leader anymore

I have had this issue recently, but thought it was just because I’m mucking around with the code. In my case the mqtt_to_db_streaming.service wasn’t running, which meant that data wasn’t being saved to the database. You can check if this is the case for you with:

sudo systemctl status pioreactor_startup_run@mqtt_to_db_streaming.service

(Should say active (running))

Here are two potential ways I check data flow.

  • Running pio mqtt should give you the stream of data coming from wherever your job is to mqtt, so you know that reading is happening (I think you already know this is happening if it’s being plotted in the graph)
  • Running pio db enters you into an sqlite shell and you can query the database to see if your data are there. For od I guess the readings are in the table “od_readings”. If you don’t know sql chatGPT is pretty good at helping.

Yes, this job running is crucial to save data to the database. I’m not sure why it’s not running, but we are investigating now.

As @noahsprent mentioned, on your leader, check the status with:

sudo systemctl status pioreactor_startup_run@mqtt_to_db_streaming.service

And if it says anything other than (running), try to restart it:

sudo systemctl restart pioreactor_startup_run@mqtt_to_db_streaming.service

Try this solution:


Run on the leader:

pio kill --job-name monitor
rm /tmp/local_intermittent_pioreactor_metadata.sqlite
sudo systemctl restart pioreactor_startup_run@mqtt_to_db_streaming.service
sudo systemctl restart pioreactor_startup_run@monitor.service

And check that things work with:

sudo systemctl status pioreactor_startup_run@mqtt_to_db_streaming.service
1 Like

Good morning,

Thank you so much for the quick responses. The mqtt_to_db_streaming service was indeed not running, the status check looks to me like it tried to start up but crashed there. A picture of the log is attached (sorry for that, don’t have internet access from that PC). Is there a way to recover the data from the workers that has not been saved to the db?

Best, Kai

@Jackd4w is it working now after running the steps I suggested above?

Unfortunately data produced is not saved if mqtt_to_db_streaming is not running. We apologize. We will try to have better recovery options in the future.

I can confirm that this happens for me each time I restart the Pioreactor with 24.12.10, and the steps above fix it until the next restart.

@gendor, hm, the fix should be permanent…

When you restart, and confirm the problem is happening, can you report what

sudo systemctl status pioreactor_startup_run@mqtt_to_db_streaming.service

says?

my last experiments run on the 24.12.10 firmware did not record data. The writing of data definitely needs to be more robust. I have to rerun those experiments but will be waiting on a stable fix for this.

To @DocRuzzy and others who experienced this problem,

I’m very sorry, and we are responsible for your wasted time and energy. We are making at least two useful changes to help avoid this problem in the future.

  1. We do have a job that detects and notifies the user if mqtt_to_db_streaming isn’t working, but since mqtt_to_db_streaming isn’t working, that notification isn’t added to the database, and so a user won’t see it in the “Event logs” (they may catch it in the notification pop-up every 6 hours, but it’s unlikely) - an annoying catch-22. We’ve changed this so that the notification will continue to pop-up until it’s resolved. While a repeated notification is annoying, it’s trying to tell you something very important and should be addressed immediately.

  2. For better visibility into the important jobs on the leader Pioreactor, we’re adding a new Leader webpage. Here’s an example where the mqtt_to_db_streaming job is offline:

    Notice the pop up on the right, and the red Off in the table.

  3. We’ll also commit to better testing of updates of our software. Previously, a user called our software “fussy”, and that really stung, but what’s worse is that the software is still too fussy. We’ll make better testing infrastructure a early-2025 goal.

2 Likes

Thanks for looking into this @CamDavidsonPilon ! I appreciate all the improvements and understand that sometimes that means that things can be a little bit unstable at times. I guess another possibility would be to have a beta flag for new releases so that they can get some real-world usage before being declared stable?

If it’s still useful, here is the output after I restart my Pioreactor on 24.12.10:

× pioreactor_startup_run@mqtt_to_db_streaming.service - Start up mqtt_to_db_streaming on boot.
     Loaded: loaded (/lib/systemd/system/pioreactor_startup_run@.service; enabled; preset: enabled)
     Active: failed (Result: exit-code) since Thu 2025-01-02 11:25:36 GMT; 17min ago
   Duration: 8min 18.487s
    Process: 688 ExecStart=pio run mqtt_to_db_streaming (code=exited, status=1/FAILURE)
   Main PID: 688 (code=exited, status=1/FAILURE)
        CPU: 1.843s

Jan 02 11:17:20 leader pio[688]:     self.on_disconnected()
Jan 02 11:17:20 leader pio[688]:   File "/usr/local/lib/python3.11/dist-packages/pioreactor/background_jobs/leader/mqtt_to_db_streaming.py", line 97, in on_disconnected
Jan 02 11:17:20 leader pio[688]:     self.timer.cancel()
Jan 02 11:17:20 leader pio[688]:     ^^^^^^^^^^
Jan 02 11:17:20 leader pio[688]: AttributeError: 'MqttToDBStreamer' object has no attribute 'timer'
Jan 02 11:17:20 leader pio[688]: 2025-01-02T11:17:20+0000 INFO   [mqtt_to_db_streaming] Disconnected.
Jan 02 11:25:34 leader pio[688]: 2025-01-02T11:25:34+0000 DEBUG  [mqtt_to_db_streaming] Disconnected successfully from MQTT.
Jan 02 11:25:36 leader systemd[1]: pioreactor_startup_run@mqtt_to_db_streaming.service: Main process exited, code=exited, status=1/FAILURE
Jan 02 11:25:36 leader systemd[1]: pioreactor_startup_run@mqtt_to_db_streaming.service: Failed with result 'exit-code'.
Jan 02 11:25:36 leader systemd[1]: pioreactor_startup_run@mqtt_to_db_streaming.service: Consumed 1.843s CPU time.

Hm, the logs I need are not truncated in that view. Can you try:

sudo journalctl -u pioreactor_startup_run@mqtt_to_db_streaming.service  -b
Jan 02 11:17:17 leader systemd[1]: Started pioreactor_startup_run@mqtt_to_db_streaming.service - Start up mqtt_to_db_streaming on boot..
Jan 02 11:17:19 leader pio[688]: 2025-01-02T11:17:19+0000 DEBUG  [mqtt_to_db_streaming] Init.
Jan 02 11:17:19 leader pio[688]: 2025-01-02T11:17:19+0000 DEBUG  [mqtt_to_db_streaming] Streaming MQTT data to /home/pioreactor/.pioreactor/storage/p>
Jan 02 11:17:19 leader pio[688]: 2025-01-02T11:17:19+0000 DEBUG  [mqtt_to_db_streaming] Listening to [TopicToParserToTable(topic='pioreactor/+/+/spec>
Jan 02 11:17:19 leader pio[688]: Traceback (most recent call last):
Jan 02 11:17:19 leader pio[688]:   File "/usr/local/lib/python3.11/dist-packages/pioreactor/mureq.py", line 167, in yield_response
Jan 02 11:17:19 leader pio[688]:     conn.request(method, path, headers=headers, body=body)
Jan 02 11:17:19 leader pio[688]:   File "/usr/lib/python3.11/http/client.py", line 1282, in request
Jan 02 11:17:19 leader pio[688]:     self._send_request(method, url, body, headers, encode_chunked)
Jan 02 11:17:19 leader pio[688]:   File "/usr/lib/python3.11/http/client.py", line 1328, in _send_request
Jan 02 11:17:19 leader pio[688]:     self.endheaders(body, encode_chunked=encode_chunked)
Jan 02 11:17:19 leader pio[688]:   File "/usr/lib/python3.11/http/client.py", line 1277, in endheaders
Jan 02 11:17:19 leader pio[688]:     self._send_output(message_body, encode_chunked=encode_chunked)
Jan 02 11:17:19 leader pio[688]:   File "/usr/lib/python3.11/http/client.py", line 1037, in _send_output
Jan 02 11:17:19 leader pio[688]:     self.send(msg)
Jan 02 11:17:19 leader pio[688]:   File "/usr/lib/python3.11/http/client.py", line 975, in send
Jan 02 11:17:19 leader pio[688]:     self.connect()
Jan 02 11:17:19 leader pio[688]:   File "/usr/lib/python3.11/http/client.py", line 941, in connect
Jan 02 11:17:19 leader pio[688]:     self.sock = self._create_connection(
Jan 02 11:17:19 leader pio[688]:                 ^^^^^^^^^^^^^^^^^^^^^^^^
Jan 02 11:17:19 leader pio[688]:   File "/usr/lib/python3.11/socket.py", line 851, in create_connection
Jan 02 11:17:19 leader pio[688]:     raise exceptions[0]
Jan 02 11:17:19 leader pio[688]:   File "/usr/lib/python3.11/socket.py", line 836, in create_connection
Jan 02 11:17:19 leader pio[688]:     sock.connect(sa)
Jan 02 11:17:19 leader pio[688]: ConnectionRefusedError: [Errno 111] Connection refused
Jan 02 11:17:19 leader pio[688]: The above exception was the direct cause of the following exception:
Jan 02 11:17:19 leader pio[688]: Traceback (most recent call last):
Jan 02 11:17:19 leader pio[688]:   File "/usr/local/lib/python3.11/dist-packages/pioreactor/whoami.py", line 52, in _get_assigned_experiment_name
Jan 02 11:17:19 leader pio[688]:     result = get_from_leader(f"/api/workers/{unit_name}/experiment")
Jan 02 11:17:19 leader pio[688]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Jan 02 11:17:19 leader pio[688]:   File "/usr/local/lib/python3.11/dist-packages/pioreactor/pubsub.py", line 374, in get_from_leader
Jan 02 11:17:19 leader pio[688]:     return get_from(leader_address, endpoint, **kwargs)
Jan 02 11:17:19 leader pio[688]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Jan 02 11:17:20 leader pio[688]:   File "/usr/local/lib/python3.11/dist-packages/pioreactor/pubsub.py", line 370, in get_from
Jan 02 11:17:20 leader pio[688]:     return mureq.get(create_webserver_path(address, endpoint), **kwargs)
Jan 02 11:17:20 leader pio[688]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Jan 02 11:17:20 leader pio[688]:   File "/usr/local/lib/python3.11/dist-packages/pioreactor/mureq.py", line 68, in get
Jan 02 11:17:20 leader pio[688]:     return request("GET", url=url, **kwargs)
Jan 02 11:17:20 leader pio[688]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Jan 02 11:17:20 leader pio[688]:   File "/usr/local/lib/python3.11/dist-packages/pioreactor/mureq.py", line 51, in request
Jan 02 11:17:20 leader pio[688]:     with yield_response(method, url, **kwargs) as response:
Jan 02 11:17:20 leader pio[688]:   File "/usr/lib/python3.11/contextlib.py", line 137, in __enter__
Jan 02 11:17:20 leader pio[688]:     return next(self.gen)
Jan 02 11:17:20 leader pio[688]:            ^^^^^^^^^^^^^^
Jan 02 11:17:20 leader pio[688]:   File "/usr/local/lib/python3.11/dist-packages/pioreactor/mureq.py", line 174, in yield_response
Jan 02 11:17:20 leader pio[688]:     raise HTTPException(str(e)) from e
Jan 02 11:17:20 leader pio[688]: http.client.HTTPException: [Errno 111] Connection refused
Jan 02 11:17:20 leader pio[688]: During handling of the above exception, another exception occurred:
Jan 02 11:17:20 leader pio[688]: Traceback (most recent call last):
Jan 02 11:17:20 leader pio[688]:   File "/usr/local/bin/pio", line 8, in <module>
Jan 02 11:17:20 leader pio[688]:     sys.exit(pio())
Jan 02 11:17:20 leader pio[688]:   File "/usr/local/lib/python3.11/dist-packages/click/core.py", line 1157, in __call__
Jan 02 11:17:20 leader pio[688]:     return self.main(*args, **kwargs)
Jan 02 11:17:20 leader pio[688]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^
Jan 02 11:17:20 leader pio[688]:   File "/usr/local/lib/python3.11/dist-packages/click/core.py", line 1078, in main
Jan 02 11:17:20 leader pio[688]:     rv = self.invoke(ctx)
Jan 02 11:17:20 leader pio[688]:          ^^^^^^^^^^^^^^^^
Jan 02 11:17:20 leader pio[688]:   File "/usr/local/lib/python3.11/dist-packages/click/core.py", line 1688, in invoke
Jan 02 11:17:20 leader pio[688]:     return _process_result(sub_ctx.command.invoke(sub_ctx))
Jan 02 11:17:20 leader pio[688]:                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Jan 02 11:17:20 leader pio[688]:   File "/usr/local/lib/python3.11/dist-packages/click/core.py", line 1688, in invoke
Jan 02 11:17:20 leader pio[688]:     return _process_result(sub_ctx.command.invoke(sub_ctx))
Jan 02 11:17:20 leader pio[688]:                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Jan 02 11:17:20 leader pio[688]:   File "/usr/local/lib/python3.11/dist-packages/click/core.py", line 1434, in invoke
Jan 02 11:17:20 leader pio[688]:     return ctx.invoke(self.callback, **ctx.params)
Jan 02 11:17:20 leader pio[688]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Jan 02 11:17:20 leader pio[688]:   File "/usr/local/lib/python3.11/dist-packages/click/core.py", line 783, in invoke
Jan 02 11:17:20 leader pio[688]:     return __callback(*args, **kwargs)
Jan 02 11:17:20 leader pio[688]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
Jan 02 11:17:20 leader pio[688]:   File "/usr/local/lib/python3.11/dist-packages/pioreactor/background_jobs/leader/mqtt_to_db_streaming.py", line 508>
Jan 02 11:17:20 leader pio[688]:     job = start_mqtt_to_db_streaming()
Jan 02 11:17:20 leader pio[688]:           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Jan 02 11:17:20 leader pio[688]:   File "/usr/local/lib/python3.11/dist-packages/pioreactor/background_jobs/leader/mqtt_to_db_streaming.py", line 496>
Jan 02 11:17:20 leader pio[688]:     return MqttToDBStreamer(get_unit_name(), UNIVERSAL_EXPERIMENT, source_to_sinks)
Jan 02 11:17:20 leader pio[688]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Jan 02 11:17:20 leader pio[688]:   File "/usr/local/lib/python3.11/dist-packages/pioreactor/background_jobs/base.py", line 105, in __call__
Jan 02 11:17:20 leader pio[688]:     obj = type.__call__(cls, *args, **kwargs)
Jan 02 11:17:20 leader pio[688]:           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Jan 02 11:17:20 leader pio[688]:   File "/usr/local/lib/python3.11/dist-packages/pioreactor/background_jobs/leader/mqtt_to_db_streaming.py", line 86,>
Jan 02 11:17:20 leader pio[688]:     self.timer = RepeatedTimer(60, self.write_stats).start()
Jan 02 11:17:20 leader pio[688]:                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Jan 02 11:17:20 leader pio[688]:   File "/usr/local/lib/python3.11/dist-packages/pioreactor/utils/timing.py", line 112, in __init__
Jan 02 11:17:20 leader pio[688]:     self.logger = create_logger(
Jan 02 11:17:20 leader pio[688]:                   ^^^^^^^^^^^^^^
Jan 02 11:17:20 leader pio[688]:   File "/usr/local/lib/python3.11/dist-packages/pioreactor/logging.py", line 178, in create_logger
Jan 02 11:17:20 leader pio[688]:     experiment = get_assigned_experiment_name(unit)
Jan 02 11:17:20 leader pio[688]:                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Jan 02 11:17:20 leader pio[688]:   File "/usr/local/lib/python3.11/dist-packages/pioreactor/whoami.py", line 36, in get_assigned_experiment_name
Jan 02 11:17:20 leader pio[688]:     return _get_assigned_experiment_name(unit_name)
Jan 02 11:17:20 leader pio[688]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Jan 02 11:17:20 leader pio[688]:   File "/usr/local/lib/python3.11/dist-packages/pioreactor/whoami.py", line 66, in _get_assigned_experiment_name
Jan 02 11:17:20 leader pio[688]:     raise mureq.HTTPException(
Jan 02 11:17:20 leader pio[688]: http.client.HTTPException: Not able to access experiments in UI. Check http://127.0.0.1 is online and check network.
Jan 02 11:17:20 leader pio[688]: 2025-01-02T11:17:20+0000 DEBUG  [mqtt_to_db_streaming] Exiting caused by Python atexit.
Jan 02 11:17:20 leader pio[688]: 2025-01-02T11:17:20+0000 DEBUG  [mqtt_to_db_streaming] Error in on_disconnected:
Jan 02 11:17:20 leader pio[688]: 2025-01-02T11:17:20+0000 DEBUG  [mqtt_to_db_streaming] 'MqttToDBStreamer' object has no attribute 'timer'
Jan 02 11:17:20 leader pio[688]: Traceback (most recent call last):
Jan 02 11:17:20 leader pio[688]:   File "/usr/local/lib/python3.11/dist-packages/pioreactor/background_jobs/base.py", line 769, in disconnected
Jan 02 11:17:20 leader pio[688]:     self.on_disconnected()
Jan 02 11:17:20 leader pio[688]:   File "/usr/local/lib/python3.11/dist-packages/pioreactor/background_jobs/leader/mqtt_to_db_streaming.py", line 97,>
Jan 02 11:17:20 leader pio[688]:     self.timer.cancel()
Jan 02 11:17:20 leader pio[688]:     ^^^^^^^^^^
Jan 02 11:17:20 leader pio[688]: AttributeError: 'MqttToDBStreamer' object has no attribute 'timer'
Jan 02 11:17:20 leader pio[688]: 2025-01-02T11:17:20+0000 INFO   [mqtt_to_db_streaming] Disconnected.
Jan 02 11:25:34 leader pio[688]: 2025-01-02T11:25:34+0000 DEBUG  [mqtt_to_db_streaming] Disconnected successfully from MQTT.
Jan 02 11:25:36 leader systemd[1]: pioreactor_startup_run@mqtt_to_db_streaming.service: Main process exited, code=exited, status=1/FAILURE
Jan 02 11:25:36 leader systemd[1]: pioreactor_startup_run@mqtt_to_db_streaming.service: Failed with result 'exit-code'.
Jan 02 11:25:36 leader systemd[1]: pioreactor_startup_run@mqtt_to_db_streaming.service: Consumed 1.843s CPU time.


Right, so it’s a race condition between when this job starts, and when the server starts. A solution you can use:

  1. Edit the main systemd file with:
    sudo nano /etc/systemd/system/pioreactor_startup_run@.service
    
  2. Add lighttpd.service to the end of After= line, ex:
    After=network-online.target firstboot.service load_rp2040.service 
    local_access_point.service lighttpd.service
    
  3. Exit and save with ctrl-x, then reboot your Pioreactor (sudo reboot)

I don’t seem to have that systemd file on my Pioreactor?

pioreactor@leader:/etc/systemd/system $ ls
apt-daily.service                           default.target               rc-local.service.d
apt-daily-upgrade.service                   getty.target.wants           reboot.target.wants
basic.target.wants                          getty@tty1.service.d         remote-fs.target.wants
chronyd.service                             graphical.target.wants       sockets.target.wants
dbus-fi.w1.wpa_supplicant1.service          halt.target.wants            sshd.service
dbus-org.freedesktop.Avahi.service          huey.service                 sysinit.target.wants
dbus-org.freedesktop.ModemManager1.service  multi-user.target.wants      sys-subsystem-net-devices-wlan0.device.wants
dbus-org.freedesktop.nm-dispatcher.service  network-online.target.wants  timers.target.wants
dbus-org.freedesktop.timesync1.service      poweroff.target.wants

Should I just create it if it doesn’t exist?

Hm, you will likely find it in /usr/lib/systemd/system/, that was the previous location for it.

Awesome, I think that did the trick, thanks Cam!