Friday, March 16, 2007

A Bayesian model for predicting race times

I made a simple Bayesian model for calculating run times based on a set of past performances.
The calculation is done using the formula
u(s; c,a) = c*s^(-a)
where u is a speed, s is a distance for which the speed is calculated and c, a are random variables representing the subject's maximum speed and the degradation of that speed with distance respectively. We start with a prior distribution p(c,a) over these two variables.

Then we calculate a posterior distribution
p(c,a|D) = p(D|c,a) p(c,a) / Z
where D is a set of run time data and Z is a normalisation constant. We can write the first term as
p(D|c,a) = \prod_i p(s_i,u_i | c,a)
assuming the data are independent given c,a, which works OK for this simple model.
Then we assume a laplacian likelihood, i.e. p(s_i, u_i | c, a) = \exp(|u_i - c*s_i^(-a)|).

Finally, we can calculate the most probable pair of c,a, the expected values for c,a, plus confidence intervals for them. This enables us to write MAP, Expected, and Credible Interval values for speed at various distances and thus for running times.

The code (for octave/matlab) is here:
http://www.idiap.ch/~dimitrak/downloads/running_formula.m