Error with storage.persistent_cache not found in config when trying to update to 25.1.21

Hi there,

I’ve been trying to update my cluster to 25.1.21 (started at 24.10.29 and updated successfully to 24.12.10 first) over the past few days and have run into some issues. On only two of my workers (out of 10 + the leader), the update failed (tried updating through the UI from the downloaded .zip and by pushing it to each Pioreactor and individually updating manually through ssh-ing in) and when checking the logs I get the following warning:

WARNING [read config] Not found in configuration: 'storage.persistent_cache'. Are you missing the following in your config?

[storage]
persistent_cache=some value

In the shared config.ini file, the filepath for persistent_cache is set to persistent_cache = /home/pioreactor/.pioreactor/storage/local_persistent_pioreactor_metadata.sqlite and it isn’t overwritten in the config for either of the Pios that aren’t updating.

When I try to open that filepath for either of the two that are failing the update the document is empty in nano. However, in any of the others that successfully updated, the file is populated (with what looks like data that isn’t meant to be viewed in nano).

Thinking it might be related to the changed in this update fixing the typo I also checked /home/pioreactor/.pioreactor/storage/local_persistant_pioreactor_metadata.sqlite, but that file location is also blank in nano

Is there something I can manually add to this file to make these Pios update or do I just need to reflash them? I had done stirring, pump, and OD calibrations with each of these, so it seems wrong that the files are empty.

Here’s the full cache from one of the two in case there’s something else in it that helps diagnose the problem:

Requirement already satisfied: pyusb==1.2.1 in /usr/local/lib/python3.11/dist-packages (from pioreactor==25.1.21->pioreactor==25.1.21) (1.2.1)
Requirement already satisfied: rpi_hardware_pwm==0.2.1 in /usr/local/lib/python3.11/dist-packages (from pioreactor==25.1.21->pioreactor==25.1.21) (0.2.1)
Requirement already satisfied: typing-extensions~=4.0 in /usr/local/lib/python3.11/dist-packages (from adafruit-circuitpython-ads1x15==2.2.23->pioreactor==25.1.21->pioreactor==25.1.21) (4.12.2)
pioreactor is already installed with the same version as the provided wheel. Use --force-reinstall to force an installation of the wheel.

2025-02-06T14:24:39-0500 [update_app] DEBUG sudo bash /tmp/release_25.1.21/update.sh
2025-02-06T14:24:45-0500 [read config] WARNING Not found in configuration: 'storage.persistent_cache'. Are you missing the following in your config?

[storage]
persistent_cache=some value


2025-02-06T14:24:46-0500 [update_app] DEBUG + export LC_ALL=C
+ LC_ALL=C
+++ dirname /tmp/release_25.1.21/update.sh
++ cd /tmp/release_25.1.21
++ pwd
+ SCRIPT_DIR=/tmp/release_25.1.21
++ crudini --get /home/pioreactor/.pioreactor/config.ini cluster.topology leader_hostname
+ LEADER_HOSTNAME=leader
+ STORAGE_DIR=/home/pioreactor/.pioreactor/storage
+ DB=/home/pioreactor/.pioreactor/storage/local_persistent_pioreactor_metadata.sqlite
+ '[' '!' -f /home/pioreactor/.pioreactor/storage/local_persistent_pioreactor_metadata.sqlite ']'
+ chown -R pioreactor:www-data /home/pioreactor/.pioreactor/storage/local_persistent_pioreactor_metadata.sqlite /home/pioreactor/.pioreactor/storage/local_persistent_pioreactor_metadata.sqlite-shm /home/pioreactor/.pioreactor/storage/local_persistent_pioreactor_metadata.sqlite-wal
+ chmod -R 770 /home/pioreactor/.pioreactor/storage/local_persistent_pioreactor_metadata.sqlite /home/pioreactor/.pioreactor/storage/local_persistent_pioreactor_metadata.sqlite-shm /home/pioreactor/.pioreactor/storage/local_persistent_pioreactor_metadata.sqlite-wal
+ sudo -u pioreactor mkdir -p /home/pioreactor/.pioreactor/storage/calibrations/od /home/pioreactor/.pioreactor/storage/calibrations/media_pump /home/pioreactor/.pioreactor/storage/calibrations/waste_pump /home/pioreactor/.pioreactor/storage/calibrations/alt_media_pump
+ sudo pip3 install /tmp/release_25.1.21/PyYAML-6.0.2-cp311-cp311-linux_armv7l.whl
+ cp /tmp/release_25.1.21/create_diskcache.sh /usr/local/bin/create_diskcache.sh
+ sudo bash /usr/local/bin/create_diskcache.sh
+ set -e
+ export LC_ALL=C
+ LC_ALL=C
+ DIR=/tmp/pioreactor_cache
+ mkdir -p /tmp/pioreactor_cache
+ chmod -R 770 /tmp/pioreactor_cache/
+ chown -R pioreactor:www-data /tmp/pioreactor_cache/
+ chmod g+s /tmp/pioreactor_cache
+ touch /tmp/pioreactor_cache/huey.db
+ touch /tmp/pioreactor_cache/huey.db-shm
+ touch /tmp/pioreactor_cache/huey.db-wal
+ touch /tmp/pioreactor_cache/local_intermittent_pioreactor_metadata.sqlite
+ touch /tmp/pioreactor_cache/local_intermittent_pioreactor_metadata.sqlite-shm
+ touch /tmp/pioreactor_cache/local_intermittent_pioreactor_metadata.sqlite-wal
+ chmod -R 770 /tmp/pioreactor_cache/
+ sudo -u pioreactor python /tmp/release_25.1.21/cal_convert.py /home/pioreactor/.pioreactor/storage/od_calibrations/cache.db
+ sudo -u pioreactor python /tmp/release_25.1.21/cal_convert.py /home/pioreactor/.pioreactor/storage/pump_calibrations/cache.db
+ chown -R pioreactor:www-data /home/pioreactor/.pioreactor/storage/calibrations/
+ sudo -u pioreactor python /tmp/release_25.1.21/cal_active.py /home/pioreactor/.pioreactor/storage/current_pump_calibration/cache.db
2025-02-06T14:24:45-0500 WARNING [read config] Not found in configuration: 'storage.persistent_cache'. Are you missing the following in your config?

[storage]
persistent_cache=some value


Traceback (most recent call last):
  File "/usr/lib/python3.11/configparser.py", line 805, in get
    value = d[option]
            ~^^^^^^^^
  File "/usr/lib/python3.11/collections/__init__.py", line 1004, in __getitem__
    return self.__missing__(key)            # support subclasses that define __missing__
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/collections/__init__.py", line 996, in __missing__
    raise KeyError(key)
KeyError: 'persistent_cache'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tmp/release_25.1.21/cal_active.py", line 43, in <module>
    main()
  File "/usr/local/lib/python3.11/dist-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/release_25.1.21/cal_active.py", line 28, in main
    with local_persistent_storage("active_calibrations") as c:
  File "/usr/lib/python3.11/contextlib.py", line 137, in __enter__
    return next(self.gen)
           ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/pioreactor/utils/__init__.py", line 413, in local_persistent_storage
    with cache(cache_name, db_path=config.get("storage", "persistent_cache")) as c:
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/pioreactor/config.py", line 92, in get
    raise e
  File "/usr/local/lib/python3.11/dist-packages/pioreactor/config.py", line 77, in get
    return super().get(section, option, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/configparser.py", line 808, in get
    raise NoOptionError(option, section)
configparser.NoOptionError: No option 'persistent_cache' in section: 'storage'

2025-02-06T14:24:46-0500 [update_app] ERROR Update failed. See logs.

Hi Donal,

It sounds like the config isn’t syncing correctly. Here’s a quick check:

On either of the failing workers, try the following:

cat ~/.pioreactor/config.ini | grep persistent_cache

probably this won’t returning anything (as expected).

Let’s manually add it though:

crudini --set ~/.pioreactor/config.ini storage persistent_cache /home/pioreactor/.pioreactor/storage/local_persistent_pioreactor_metadata.sqlite

Now try:

pio log -m "test"

to see if it works.


If so, next let’s restart some services to make sure everyone sees the new config:

sudo systemctl restart lighttpd.service
sudo systemctl restart huey.service
sudo systemctl restart pioreactor_startup_run@monitor.service

Does everything look okay after that?


I wonder why the configs aren’t syncing. On the leader, try pios sync-configs and look for any errors when talking to these two workers.

I was able to update them following this, but when I updated the rest of my cluster (I had downgraded everything back to 24.12.10), they no longer showed as online in the inventory and when I press the button on these two Pios, it does not light up blue. This persisted after power-cycling too.

However, I am still able to ssh into them, so they are still on my local access point, which is even more bizarre. Any thoughts on how to get them reconnected properly? I was hesitant to try removing and re-adding them from/to the cluster in case they didn’t respond to that.

Also, I did get sync-configs errors pop up before I even checked it, so there’s definitely something going on there. When I tried pios sync-configs on the leader, it prompted me for the password to some some but not all of the workers (including the two that had been having trouble updating). I entered the password correctly for each of these, but I got Permission denied, please try again and eventually it timed out with io timeout after 30 seconds -- exiting. In the traceback, it looks like it had trouble with workers 1, 3, 4, 5, and 8 (3 and 5 are the ones that had updating issues).

[sender] io timeout after 30 seconds -- exiting
rsync error: timeout in data send/receive (code 30) at io.c(200) [sender=3.2.7]
2025-02-06T15:46:56-0500 WARNING [sync_configs] Could not transfer config to worker1. Is it online?
2025-02-06T15:46:56-0500 DEBUG  [sync_configs] Error moving file /home/pioreactor/.pioreactor/config.ini to worker1:/home/pioreactor/.pioreactor/config.ini.
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/pioreactor/utils/networking.py", line 17, in rsync
    r = subprocess.run(("rsync",) + args, check=True)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '('rsync', '-z', '--timeout', '30', '--inplace', '--checksum', '-e', 'ssh', '/home/pioreactor/.pioreactor/config.ini', 'worker1.local:/home/pioreactor/.pioreactor/config.ini')' returned non-zero exit status 30.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/pioreactor/utils/networking.py", line 25, in cp_file_across_cluster
    rsync(
  File "/usr/local/lib/python3.11/dist-packages/pioreactor/utils/networking.py", line 20, in rsync
    raise RsyncError from e
pioreactor.exc.RsyncError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/pioreactor/cli/pios.py", line 584, in _thread_function
    sync_config_files(unit, shared, specific)
  File "/usr/local/lib/python3.11/dist-packages/pioreactor/cli/pios.py", line 174, in sync_config_files
    cp_file_across_cluster(unit, localpath, remotepath, timeout=30)
  File "/usr/local/lib/python3.11/dist-packages/pioreactor/utils/networking.py", line 37, in cp_file_across_cluster
    raise RsyncError(f"Error moving file {localpath} to {unit}:{remotepath}.")
pioreactor.exc.RsyncError: Error moving file /home/pioreactor/.pioreactor/config.ini to worker1:/home/pioreactor/.pioreactor/config.ini.
[sender] io timeout after 30 seconds -- exiting
rsync error: timeout in data send/receive (code 30) at io.c(200) [sender=3.2.7]
[sender] io timeout after 30 seconds -- exiting
[sender] io timeout after 30 seconds -- exiting
rsync error: timeout in data send/receive (code 30) at io.c(200) [sender=3.2.7]
rsync error: timeout in data send/receive (code 30) at io.c(200) [sender=3.2.7]
2025-02-06T15:46:56-0500 WARNING [sync_configs] Could not transfer config to worker4. Is it online?
2025-02-06T15:46:56-0500 WARNING [sync_configs] Could not transfer config to worker5. Is it online?
2025-02-06T15:46:56-0500 DEBUG  [sync_configs] Error moving file /home/pioreactor/.pioreactor/config.ini to worker4:/home/pioreactor/.pioreactor/config.ini.
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/pioreactor/utils/networking.py", line 17, in rsync
    r = subprocess.run(("rsync",) + args, check=True)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '('rsync', '-z', '--timeout', '30', '--inplace', '--checksum', '-e', 'ssh', '/home/pioreactor/.pioreactor/config.ini', 'worker4.local:/home/pioreactor/.pioreactor/config.ini')' returned non-zero exit status 30.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/pioreactor/utils/networking.py", line 25, in cp_file_across_cluster
    rsync(
  File "/usr/local/lib/python3.11/dist-packages/pioreactor/utils/networking.py", line 20, in rsync
    raise RsyncError from e
pioreactor.exc.RsyncError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/pioreactor/cli/pios.py", line 584, in _thread_function
    sync_config_files(unit, shared, specific)
  File "/usr/local/lib/python3.11/dist-packages/pioreactor/cli/pios.py", line 174, in sync_config_files
    cp_file_across_cluster(unit, localpath, remotepath, timeout=30)
  File "/usr/local/lib/python3.11/dist-packages/pioreactor/utils/networking.py", line 37, in cp_file_across_cluster
    raise RsyncError(f"Error moving file {localpath} to {unit}:{remotepath}.")
pioreactor.exc.RsyncError: Error moving file /home/pioreactor/.pioreactor/config.ini to worker4:/home/pioreactor/.pioreactor/config.ini.
2025-02-06T15:46:56-0500 WARNING [sync_configs] Could not transfer config to worker3. Is it online?
2025-02-06T15:46:56-0500 DEBUG  [sync_configs] Error moving file /home/pioreactor/.pioreactor/config.ini to worker5:/home/pioreactor/.pioreactor/config.ini.
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/pioreactor/utils/networking.py", line 17, in rsync
    r = subprocess.run(("rsync",) + args, check=True)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '('rsync', '-z', '--timeout', '30', '--inplace', '--checksum', '-e', 'ssh', '/home/pioreactor/.pioreactor/config.ini', 'worker5.local:/home/pioreactor/.pioreactor/config.ini')' returned non-zero exit status 30.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/pioreactor/utils/networking.py", line 25, in cp_file_across_cluster
    rsync(
  File "/usr/local/lib/python3.11/dist-packages/pioreactor/utils/networking.py", line 20, in rsync
    raise RsyncError from e
pioreactor.exc.RsyncError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/pioreactor/cli/pios.py", line 584, in _thread_function
    sync_config_files(unit, shared, specific)
  File "/usr/local/lib/python3.11/dist-packages/pioreactor/cli/pios.py", line 174, in sync_config_files
    cp_file_across_cluster(unit, localpath, remotepath, timeout=30)
  File "/usr/local/lib/python3.11/dist-packages/pioreactor/utils/networking.py", line 37, in cp_file_across_cluster
    raise RsyncError(f"Error moving file {localpath} to {unit}:{remotepath}.")
pioreactor.exc.RsyncError: Error moving file /home/pioreactor/.pioreactor/config.ini to worker5:/home/pioreactor/.pioreactor/config.ini.
2025-02-06T15:46:56-0500 DEBUG  [sync_configs] Error moving file /home/pioreactor/.pioreactor/config.ini to worker3:/home/pioreactor/.pioreactor/config.ini.
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/pioreactor/utils/networking.py", line 17, in rsync
    r = subprocess.run(("rsync",) + args, check=True)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '('rsync', '-z', '--timeout', '30', '--inplace', '--checksum', '-e', 'ssh', '/home/pioreactor/.pioreactor/config.ini', 'worker3.local:/home/pioreactor/.pioreactor/config.ini')' returned non-zero exit status 30.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/pioreactor/utils/networking.py", line 25, in cp_file_across_cluster
    rsync(
  File "/usr/local/lib/python3.11/dist-packages/pioreactor/utils/networking.py", line 20, in rsync
    raise RsyncError from e
pioreactor.exc.RsyncError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/pioreactor/cli/pios.py", line 584, in _thread_function
    sync_config_files(unit, shared, specific)
  File "/usr/local/lib/python3.11/dist-packages/pioreactor/cli/pios.py", line 174, in sync_config_files
    cp_file_across_cluster(unit, localpath, remotepath, timeout=30)
  File "/usr/local/lib/python3.11/dist-packages/pioreactor/utils/networking.py", line 37, in cp_file_across_cluster
    raise RsyncError(f"Error moving file {localpath} to {unit}:{remotepath}.")
pioreactor.exc.RsyncError: Error moving file /home/pioreactor/.pioreactor/config.ini to worker3:/home/pioreactor/.pioreactor/config.ini.
[sender] io timeout after 30 seconds -- exiting
rsync error: timeout in data send/receive (code 30) at io.c(200) [sender=3.2.7]
2025-02-06T15:46:56-0500 WARNING [sync_configs] Could not transfer config to worker8. Is it online?
2025-02-06T15:46:56-0500 DEBUG  [sync_configs] Error moving file /home/pioreactor/.pioreactor/config.ini to worker8:/home/pioreactor/.pioreactor/config.ini.
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/pioreactor/utils/networking.py", line 17, in rsync
    r = subprocess.run(("rsync",) + args, check=True)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '('rsync', '-z', '--timeout', '30', '--inplace', '--checksum', '-e', 'ssh', '/home/pioreactor/.pioreactor/config.ini', 'worker8.local:/home/pioreactor/.pioreactor/config.ini')' returned non-zero exit status 30.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/pioreactor/utils/networking.py", line 25, in cp_file_across_cluster
    rsync(
  File "/usr/local/lib/python3.11/dist-packages/pioreactor/utils/networking.py", line 20, in rsync
    raise RsyncError from e
pioreactor.exc.RsyncError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/pioreactor/cli/pios.py", line 584, in _thread_function
    sync_config_files(unit, shared, specific)
  File "/usr/local/lib/python3.11/dist-packages/pioreactor/cli/pios.py", line 174, in sync_config_files
    cp_file_across_cluster(unit, localpath, remotepath, timeout=30)
  File "/usr/local/lib/python3.11/dist-packages/pioreactor/utils/networking.py", line 37, in cp_file_across_cluster
    raise RsyncError(f"Error moving file {localpath} to {unit}:{remotepath}.")
pioreactor.exc.RsyncError: Error moving file /home/pioreactor/.pioreactor/config.ini to worker8:/home/pioreactor/.pioreactor/config.ini.

Focusing just on the online part first. I think the solution is to “upgrade-forward” to resolve this.

  1. Use the UI to upload the 25.1.21 release zip file. This will upload the zip file to the leader, and try to update the workers, but fail. That’s okay.

  2. SSH into your leader, and run:

    ssh worker2.local "pio update --source /tmp/release_25.1.21.zip"
    

    (I’m only updating worker2 as a test, and you mentioned it looks to be okay). After that, try power-cycling worker2 and see if it comes online and has the correct version on the Inventory page.

  3. If so, you can try the other operational workers this way (not 1, 3, 4, 5, and 8)

  4. You can try doing the same for workers 1, 3, 4, 5, and 8, but if you encounter an error let me know and we can take it from there.

Sorry I wasn’t clear enough in my first reply - despite it saying there were these sync issues, besides the two I manually updated, all of the Pios successfully updated to 25.1.21, even the ones it wasn’t syncing with (1, 4, and 8).

Trying to update with ssh workerX.local "pio update --source /tmp/release_25.1.21.zip" was successful on worker 2 (no previous issues) and worker 1 (previously had sync issues), but did not on worker 3 (sync issues and not responding in the inventory). This was the logs for the traceback error for that last case:

+ pioreactorui.tasks.update_app_from_release_archive_on_specific_pioreactors
+ pioreactorui.tasks.pio
+ pioreactorui.tasks.pio_plugins_list
+ pioreactorui.tasks.pio_run_export_experiment_data
+ pioreactorui.tasks.pio_kill
+ pioreactorui.tasks.pio_plugins
+ pioreactorui.tasks.update_clock
+ pioreactorui.tasks.sync_clock
+ pioreactorui.tasks.pio_update_app
+ pioreactorui.tasks.pio_update
+ pioreactorui.tasks.pio_update_ui
+ pioreactorui.tasks.rm
+ pioreactorui.tasks.shutdown
+ pioreactorui.tasks.reboot
+ pioreactorui.tasks.pios
+ pioreactorui.tasks.save_file
+ pioreactorui.tasks.write_config_and_sync
+ pioreactorui.tasks.post_to_worker
+ pioreactorui.tasks.multicast_post_across_cluster
+ pioreactorui.tasks.get_from_worker
+ pioreactorui.tasks.multicast_get_across_cluster
+ pioreactorui.tasks.patch_to_worker
+ pioreactorui.tasks.multicast_patch_across_cluster
+ pioreactorui.tasks.delete_from_worker
+ pioreactorui.tasks.multicast_delete_across_cluster
2025-02-06T16:13:30-0500 [huey.consumer] INFO Starting Huey consumer...
2025-02-06T16:13:30-0500 [huey.consumer] INFO Starting Huey consumer...
2025-02-06T16:13:30-0500 [huey.consumer] INFO Starting Huey consumer...
2025-02-06T16:13:30-0500 [huey.consumer] INFO Cache directory = /tmp/pioreactor_cache
2025-02-06T16:13:30-0500 [huey.consumer] INFO Cache directory = /tmp/pioreactor_cache
2025-02-06T16:13:30-0500 [huey.consumer] INFO Cache directory = /tmp/pioreactor_cache
2025-02-06T16:13:30-0500 [huey.consumer] INFO Starting Huey consumer...
2025-02-06T16:13:30-0500 [huey.consumer] INFO Cache directory = /tmp/pioreactor_cache
2025-02-06T16:13:30-0500 [huey.consumer] INFO Starting Huey consumer...
2025-02-06T16:13:30-0500 [huey.consumer] INFO Cache directory = /tmp/pioreactor_cache
2025-02-06T16:13:30-0500 [huey.consumer] INFO Starting Huey consumer...
2025-02-06T16:13:30-0500 [huey.consumer] INFO Cache directory = /tmp/pioreactor_cache
2025-02-06T16:13:30-0500 [update_app] DEBUG mv /tmp/release_25.1.21/pioreactorui_*.tar.gz /tmp/pioreactorui_archive.tar.gz
2025-02-06T16:13:30-0500 [update_app] DEBUG sudo rm -rf /tmp/release_25.1.21
2025-02-06T16:13:30-0500 [pioreactorui-leader] DEBUG Starting pioreactorui-leader=25.1.21 on leader...
2025-02-06T16:13:30-0500 [pioreactorui-leader] DEBUG .env={'DOT_PIOREACTOR': '/home/pioreactor/.pioreactor/', 'WWW': '/var/www/pioreactorui/', 'PIO_EXECUTABLE': '', 'PIOS_EXECUTABLE': ''}
2025-02-06T16:13:30-0500 [update_app] NOTICE Updated Pioreactor app to version 25.1.21.
2025-02-06T16:13:30-0500 [monitor] INFO Updated versions from {'app': '25.1.21', 'hat': '0.0', 'firmwa.. to {'app': '25.1.21', 'hat': '0.0', 'firmwa...
2025-02-06T16:13:30-0500 [update_ui] DEBUG bash /usr/local/bin/update_ui.sh /tmp/pioreactorui_archive.tar.gz
2025-02-06T16:13:30-0500 [pioreactorui-leader] DEBUG Starting MQTT client
2025-02-06T16:13:32-0500 [huey.consumer] INFO Received SIGTERM
2025-02-06T16:13:32-0500 [huey.consumer] INFO Shutting down
2025-02-06T16:13:32-0500 [huey.consumer] INFO Consumer exiting.
2025-02-06T16:13:33-0500 [systemd] DEBUG huey.service successful
2025-02-06T16:13:33-0500 [update_ui] INFO Updated PioreactorUI to version /tmp/pioreactorui_archive.tar.gz.
2025-02-06T16:13:33-0500 [pioreactorui-leader] DEBUG Starting pioreactorui-leader=25.1.21 on leader...
2025-02-06T16:13:33-0500 [pioreactorui-leader] DEBUG .env={'DOT_PIOREACTOR': '/home/pioreactor/.pioreactor/', 'WWW': '/var/www/pioreactorui/', 'PIO_EXECUTABLE': '', 'PIOS_EXECUTABLE': ''}
2025-02-06T16:13:33-0500 [huey.consumer] INFO Huey consumer started with 6 thread, PID 7223 at 2025-02-06 21:13:33.319348
2025-02-06T16:13:33-0500 [huey.consumer] INFO Scheduler runs every 1 second(s).
2025-02-06T16:13:33-0500 [huey.consumer] INFO Periodic tasks are disabled.
2025-02-06T16:13:33-0500 [huey.consumer] INFO The following commands are available:
+ pioreactorui.tasks.pio_run
+ pioreactorui.tasks.add_new_pioreactor
+ pioreactorui.tasks.update_app_across_cluster
+ pioreactorui.tasks.update_app_from_release_archive_across_cluster
+ pioreactorui.tasks.update_app_from_release_archive_on_specific_pioreactors
+ pioreactorui.tasks.pio
+ pioreactorui.tasks.pio_plugins_list
+ pioreactorui.tasks.pio_run_export_experiment_data
+ pioreactorui.tasks.pio_kill
+ pioreactorui.tasks.pio_plugins
+ pioreactorui.tasks.update_clock
+ pioreactorui.tasks.sync_clock
+ pioreactorui.tasks.pio_update_app
+ pioreactorui.tasks.pio_update
+ pioreactorui.tasks.pio_update_ui
+ pioreactorui.tasks.rm
+ pioreactorui.tasks.shutdown
+ pioreactorui.tasks.reboot
+ pioreactorui.tasks.pios
+ pioreactorui.tasks.save_file
+ pioreactorui.tasks.write_config_and_sync
+ pioreactorui.tasks.post_to_worker
+ pioreactorui.tasks.multicast_post_across_cluster
+ pioreactorui.tasks.get_from_worker
+ pioreactorui.tasks.multicast_get_across_cluster
+ pioreactorui.tasks.patch_to_worker
+ pioreactorui.tasks.multicast_patch_across_cluster
+ pioreactorui.tasks.delete_from_worker
+ pioreactorui.tasks.multicast_delete_across_cluster
2025-02-06T16:13:33-0500 [huey.consumer] INFO Starting Huey consumer...
2025-02-06T16:13:33-0500 [huey.consumer] INFO Starting Huey consumer...
2025-02-06T16:13:33-0500 [huey.consumer] INFO Cache directory = /tmp/pioreactor_cache
2025-02-06T16:13:33-0500 [huey.consumer] INFO Starting Huey consumer...
2025-02-06T16:13:33-0500 [huey.consumer] INFO Cache directory = /tmp/pioreactor_cache
2025-02-06T16:13:33-0500 [huey.consumer] INFO Starting Huey consumer...
2025-02-06T16:13:33-0500 [huey.consumer] INFO Cache directory = /tmp/pioreactor_cache
2025-02-06T16:13:33-0500 [huey.consumer] INFO Starting Huey consumer...
2025-02-06T16:13:33-0500 [huey.consumer] INFO Cache directory = /tmp/pioreactor_cache
2025-02-06T16:13:33-0500 [huey.consumer] INFO Cache directory = /tmp/pioreactor_cache
2025-02-06T16:13:33-0500 [huey.consumer] INFO Starting Huey consumer...
2025-02-06T16:13:33-0500 [huey.consumer] INFO Cache directory = /tmp/pioreactor_cache
2025-02-06T16:13:33-0500 [pioreactorui-leader] DEBUG Starting pioreactorui-leader=25.1.21 on leader...
2025-02-06T16:13:33-0500 [pioreactorui-leader] DEBUG .env={'DOT_PIOREACTOR': '/home/pioreactor/.pioreactor/', 'WWW': '/var/www/pioreactorui/', 'PIO_EXECUTABLE': '', 'PIOS_EXECUTABLE': ''}
2025-02-06T16:13:33-0500 [pioreactorui-leader] DEBUG Starting MQTT client

Ok gotcha, so we have the cluster updated to 25.1.21, but some workers aren’t connected as seen on the Inventory page.

Again, try the following: SSH into one of those workers, and try:

pio run monitor

Does that fail? If not, try rebooting with sudo reboot and fingers-crossed it should come up working. If not, look for errors in:

sudo systemctl status pioreactor_startup_run@monitor.service

I’m not sure why the pios sync-configs isn’t working - I’ve never seen that error. I wonder if the network is getting overloaded. Can you try:

pios sync-configs --units worker1 --units worker2 --units worker3

(trying a smaller subset of units to see if that still works)

Hi there, sorry for the delayed response, but I just got a chance to take a look at this again.

When I tried pio run monitor on one of the failing workers it errors out because it can’t find storage.temporary_cache. This to me would seem like something related to updating from 24.10.29 to 24.12.10, but when I check the version, I get that it is 25.1.21. Here’s the full traceback for that in case there’s something more I’m missing in it:

WARNING [read config] Not found in configuration: 'storage.temporary_cache'. Are you missing the following in your config?

[storage]
temporary_cache=some value


Traceback (most recent call last):
  File "/usr/lib/python3.11/configparser.py", line 805, in get
    value = d[option]
            ~^^^^^^^^
  File "/usr/lib/python3.11/collections/__init__.py", line 1004, in __getitem__
    return self.__missing__(key)            # support subclasses that define __missing__
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/collections/__init__.py", line 996, in __missing__
    raise KeyError(key)
KeyError: 'temporary_cache'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/pio", line 8, in <module>
    sys.exit(pio())
             ^^^^^
  File "/usr/local/lib/python3.11/dist-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/pioreactor/background_jobs/monitor.py", line 619, in click_monitor
    job = Monitor(unit=whoami.get_unit_name(), experiment=whoami.UNIVERSAL_EXPERIMENT)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/pioreactor/background_jobs/base.py", line 105, in __call__
    obj = type.__call__(cls, *args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/pioreactor/background_jobs/monitor.py", line 102, in __init__
    super().__init__(unit=unit, experiment=experiment)
  File "/usr/local/lib/python3.11/dist-packages/pioreactor/background_jobs/base.py", line 941, in __init__
    super().__init__(unit, experiment, source="app")
  File "/usr/local/lib/python3.11/dist-packages/pioreactor/background_jobs/base.py", line 282, in __init__
    self._check_for_duplicate_activity()
  File "/usr/local/lib/python3.11/dist-packages/pioreactor/background_jobs/base.py", line 917, in _check_for_duplicate_activity
    if is_pio_job_running(self.job_name) and not is_testing_env():
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/pioreactor/utils/__init__.py", line 447, in is_pio_job_running
    with JobManager() as jm:
         ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/pioreactor/utils/__init__.py", line 614, in __init__
    db_path = config.get("storage", "temporary_cache")
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/pioreactor/config.py", line 92, in get
    raise e
  File "/usr/local/lib/python3.11/dist-packages/pioreactor/config.py", line 77, in get
    return super().get(section, option, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/configparser.py", line 808, in get
    raise NoOptionError(option, section)
configparser.NoOptionError: No option 'temporary_cache' in section: 'storage'

Additionally, when I tried sudo systemctl status pioreactor_startup_run@monitor.service, I got an exit-code that I’m not really sure how to interpret:

× pioreactor_startup_run@monitor.service - Start up monitor on boot.
     Loaded: loaded (/etc/systemd/system/pioreactor_startup_run@.service; enabl>
     Active: failed (Result: exit-code) since Thu 2025-02-06 15:37:59 EST; 4 da>
   Duration: 20min 43.767s
    Process: 661 ExecStart=pio run monitor (code=exited, status=1/FAILURE)
   Main PID: 661 (code=exited, status=1/FAILURE)
        CPU: 1.322s

Finally, when trying sync-configs on a subset (I used worker4 rather than worker3, since worker3 is one of the ones I’m having the most trouble with currently), it prompted me multiple times for worker4’s password saying Permission denied, please try again. after each time I inputted the password (correctly) and then after the third try seemingly ran the command (there was no visible confirmation and their configs were already identical). I tried running this a second time and this time it worked immediately without asking for the password to any workers.

Editing to add, I tried syncing a subset of 3 Pios including worker3 (one of the ones that’s not behaving correctly), and that was successful (it did prompt for worker3’s password, but no others).

@DonalC it sounds like there’s some odd credential mismatch - normally you don’t need to ask for a password since on the initial handshake, the leader & worker share certificates.

Can you try running pio workers add worker4? Re-adding a worker is harmless, and you don’t need to “delete” it first to re-add it. This will redo the handshake. Note: It will also power-cycle the worker at the end.

After it comes back online, try your sync-configs test.

It worked properly this time (but it did prompt for worker1’s password without any issue once it was entered), but when worker4 was re-added it said it was on 24.10.29 rather than 25.1.21 (and I confirmed this with pio version), any thoughts why that might be?

Also any ideas on what to do with the other two that still aren’t responding? I tried maually re-adding them from the leader and this somewhat worked, but they both produced a log that the webserver isn’t online. When I check the logs directly from those Pios, there doesn’t seem to be anything about this. Additionally, only 1 of the 2 with this problem has the warning of no storage.temporary_cache (even after syncing configs with other Pios and the leader)

When worker4 was re-added it said it was on 24.10.29 rather than 25.1.21 (and I confirmed this with pio version ), any thoughts why that might be?

It’s not obvious to me! Maybe a Python installation issue (user vs root installation)? But that’s another deep hole to go through.

Try looking at the logs from:

sudo systemctl status lighttpd.service

If you are missing the temp cache config, try just adding it to the ~/.pioreactor/config.ini manually using nano

[storage]
database=/home/pioreactor/.pioreactor/storage/pioreactor.sqlite
temporary_cache=/tmp/pioreactor_cache/local_intermittent_pioreactor_metadata.sqlite
persistent_cache=/home/pioreactor/.pioreactor/storage/local_persistent_pioreactor_metadata.sqlite

And then restarting the webserver:

sudo systemctl restart lighttpd.service && sudo systemctl restart huey.service

I was poking around some more, and realized that the calibrations page would not display any of my legacy calibrations (despite every worker having them), and when I tried downloading calibrations from that page, the .yaml folder only had calibrations for half of the workers.

I’m just going to pull everything I have off all my workers and leader and reflash them starting on the newest image file since something appears to be irreparably wrong; thank you for all your help!