Calculating mathematical distance between two dosing functions

Hey @CamDavidsonPilon , it seems like you have an interest in math and statistics and I thought you might find this question interesting to think about.

How would you “quantify” the distance between two dosing functions over a period of time into a number? For example, imagine I’m running an experiment with 1 through N pioreactors, and pioreactor ‘x’ (P_x) is a chemostat operating at these values at time ‘t’:

  • Media (dose/hour): a_dose(x, t) = A_x * sin( W_x * t) + B_x
  • Alternate Media (dose/hour): am_dose(x, t) = C_x * sin( Y_x * t) + D_x
  • Target Temperature (temp_setpoint): temp(x, t) = E_x * sin( Z_x * t) + F_x

A_x is an element from [A_min, A_max]
B_x, C_x, etc… elements from [B/C_min, B/C_max]

I was thinking to try and use a distance formula to try and find the distance between P_x1 and P_x2.

Distance(P_x1, P_x2) = sqrt( [a_dose(x2, t) - a_dose(x1, t)]^2 + [ am_dose(x2, t) - am_dose(x1, t)]^2 + …

If I run all of the settings using a sine/cosine function, then I want a normalized general function that can give me the “distance” between two sine/cosine functions at time “t”. This could look like

norm_dist_a (x1, x2, t) = [ a_dose(x2, t) - a_dose(x1, t) ] ^ 2 / [ a_dose_max - a_dose_min ] ^ 2
norm_dist_am (x1, x2, t) = [ am_dose(x2, t) - am_dose(x1, t) ] ^ 2 / [ am_dose_max - am_dose_min ] ^ 2
norm_dist_temp (x1, x2, t) = [ temp(x2, t) - temp(x1, t) ] ^ 2 / [ temp_max - temp_min ] ^ 2

Then, the distance between P_x1 and P_x2 at time t is:

Distance(P_x1, P_x2, t) = sqrt( norm_dist_a(x1, x2, t) + norm_distance_am(x1, x2, t) + norm_distance_temp(x1, x2, t) )

Using this, you can calculate a distance between two pioreactor environments at time t, but is there a way to extend this concept of a “distance” over a period of time, 24 hours for example. I’m not too familiar with statistics, but this seems like an applicable approach to see and quantify the distribution of distances between X1 and X2 over a 24 hour period.

What are your thoughts?

I’ve been thinking of this over the weekend, and I have a few thoughts, some of which may be useful to you:

  • given a culture with constant growth rate, a turbidostat and a chemostat look identical if we are only comparing their input / output. So under some constructed distance functions, they are identical. Does this feel correct? I’m not sure.
    • If we also compare the cultures OD time series, then they are now longer identical. Maybe this motivates putting the culture OD time series as an input.
    • An interesting approach is an algorithmic one: given a dosing function, you can write it out as an algorithm (like what I’ve done here, but more formally). You could compare the distance between two automations to be equal to a string-based distance metric applied to two input algorithms. This solves the turbidostat /chemostat distinction above, too.
  • Based on your approach, the L2 distance metric you propose is fine. However, you are working with time series, and there are lots of measures between two time series that can be applied. A simple one is a cross-correlation between two time series. I think this can be massaged into a proper distance function, too.

I don’t have a complete answer at this time, but this is an interesting open question!