I just looked at the code and was thinking: Do the workers save their “state” somewhere so it is persistent over power-cycles? I’m asking because we had a blackout while the workers were assigned to the “old” leader, after which I had to re-image the leader. Could there be something that is conflicting now?
mm, not much state is saved on the workers. Assignments are saved on the leader, and workers ask the leader what experiment they are in when a job starts.
Can you run the following on a worker? Open python with python3
, and copy-paste the following:
from pioreactor.pubsub import get_from_leader
from pioreactor.whoami import get_unit_name
unit_name = get_unit_name()
result = get_from_leader(f"/api/workers/{unit_name}/experiment")
print(result.json())
Does that resolve to show the correct experiment?
So with this I was able to solve it partially:
Apparently the API Query from the Worker to the Leader is case-sensitive. My workers have capital letters in their hostnames but I did register them to the Cluster with all lowercase which worked but produced all the problems described in the thread. When I register the workers with the right capitalization I get the correct information and status displayed and can assign them to experiments.
But it looks like that I can not control them from the WebUI (tried to start stirring and so on but it just times out) or from the leader using “pios run stirring”. I can however start the jobs now locally on the workers (pio run stirring).
Thanks for the help so far!
Best,
Kai
wow great catch, we’ll test upper/lowercase names and make this more clear to future users.
I think the last thing you may need to do is update the workers’ webserver. Again, if you have access to the internet, try the following on the workers.
pio update ui
If you don’t have access, upload this file:
to your workers, and run:
pio update ui --source pioreactorui_24.10.30.tar.gz
Yep that did the trick! Thank you very much
Best, Kai