Project cloud database

Martin · June 30, 2023, 3:41pm

How difficult would it be to set up an automated means of sending run data from multiple users to a central project database?

For example, when collaborating on open data projects, it would be amazing if anyone interested could analyse and visualise large data sets from multiple contributors.

CamDavidsonPilon · June 30, 2023, 6:09pm

Hi @Martin

I don’t think this would be very difficult at all, actually. Certainly it would use the plugin system, and all users would install the plugin and add the right DB credentials on their local system.

Are you thinking the data should arrive in realtime (ex: to be displayed and graphed in a UI) to the central database? Or batched in data dumps?

Martin · June 30, 2023, 7:32pm

Hi @CamDavidsonPilon

That’s excellent news. While I was originally thinking that the data would be sent at the conclusion of each experiment, the realtime idea has a lot of advantages:

You could have people across different timezones looking after your experiments while you sleep.
People who don’t yet have Pioreactors could still engage in live experiments
You could do remote teaching where the students are able to manipulate the data and make suggestions during the live experiment.
If your lab was sabotaged the data would be safely stored offsite
We could build Pioreactor digital twins

CamDavidsonPilon · June 30, 2023, 7:58pm

Here’s an outline of a few ways realtime could be done:

Using MQTT (no Python, but lots of networking)

The Pioreactor uses Mosquitto, a software that implements MQTT, for realtime data streams. A Mosquitto broker lives on the leader in each Pioreactor cluster.

It’s possible to define a “bridge” between Mosquitto brokers so that they can share messages. In this case, we just want a one way stream between all users and a central Mosquitto broker. On the central server with the central broker, we can just write those messages to disk (or duplicate what we do in the Pioreactor, and write to tables in SQLite3).

The central server needs to be accessible, which suggests a public API (with credentials), but you could also use a project like Tailscale to create a private VPN with users’ leaders.

Using plugins (more Python, but less networking)

You can write a Python script that

listens to all topics in MQTT, and
serializes the topic and message and pushes it to an API on some central server.

The server is set up as a webserver and knows how to handle incoming requests (writes to disk, saves to a db, etc).

This script can be distributed to users to run in the background. More formally, the Python script could be a proper Pioreactor plugin if you wanted, too.

Martin · June 30, 2023, 8:50pm

That sounds excellent, I imagine the plugin option would be better long term. Do you have a feel for which would be closer to real-time?

CamDavidsonPilon · June 30, 2023, 9:44pm

I agree, the plugin would be easier to debug and modify, too. Both will be sub-second latency, I imagine. Probably the MQTT approach would be faster, but I don’t think users would notice.

Martin · July 7, 2023, 12:15am

And do you think we’d be best sticking with SQLite for the cloud database? I normally use GCP (mainly for their carbon credentials) but their Cloud SQL doesn’t include SQLite & not many hosts appear to support it to the same extent that they support PostgreSQL, MySQL & various No SQL options. Conversely one benefit of sticking with SQLite in the cloud would be easier repurposing of Pioreactor code for visualisation of the online data, etc.

CamDavidsonPilon · July 7, 2023, 12:38am

I would use whatever DB you’re most comfortable with. I like SQLite because it works well on Raspberry Pis, but if you are going to use GCP - by all means, use what you like!

I imagine you’d have to design your own schema, too, and perhaps loosely based on some of Pioreactor’s tables (schema’s here btw). The Pioreactor UI’s viz’s are just queries against the tables, and these queries can be adapted to other databases with ease.