Synergy Percolation MCMC


Outline of the model

In an epidemic simulation that can be thought of as pure percolation, the rate of infection and infectious period are both constants. In this simulation containing synergy, the infectious period remains constant, but the rate of infection is now dependent on the environment around the infecting site. Specifically, the infectious rate of an infected individual increases linearly with the number of infectious individuals the site is attached to, rate=alpha+beta*n where n is the number of attached sites.



In this diagram, the rate at which site 4 becomes infected is alpha. Site 5 becomes infected at a rate alpha + beta, because the site next to it is connected to one other infectious individual (not 2, since there is no bond going to its right). Site 2 is therefore under an infectious pressure of 2*alpha+beta, and so on.

Since we're concerned with the presence or absence of bonds, it becomes necessary to think of things in terms of bonds being created rather than the sites becoming infected. Therefore for site 2, instead of thinking of a rate of 2*alpha+beta, we think of the left bond being created with rate alpha, and the right with rate alpha+beta. Bonds must also be considered between infectious sites. In order to be able to use percolation techniques, in the absence of synergy the system must reproduce simple percolation. This requires the creation of bonds to be the same under all conditions, and therefore the creation of bonds between two infectious sites must happen at rate alpha, possibly half what one might expect. This is interpreted as the average of the two infectious sites' rates, to allow calculation of the rate when synergy is present. Note that no bond can ever be made to a site in the removed state.

Once this simulation is performed using a suitable algorithm, the results are taken to be the times of infection of each of the sites. The times that the bonds were created was also outputted, for verification of the likelihood. For the output to be as close as possible to the experimental data, the infection times of the sites is 'quantized' such that the actual infection times is replaced by a min and max time, aligned on boundaries (ie, site 1 was infected between day 1 and 2, site 2 between 2 and 3 etc.).

Description of the algorithm

N.b. This algorithm might not work, and is here to help describe the problem so that it can be debugged!

The site-infection data are read in, and bonds 'created' between every neighbouring infectious site at the minimum possible time, except those whose infectious periods do not overlap by the initial lifetime of the infection. Although this is perhaps not the most intelligent of initial conditions for the mcmc process, it is by far the easiest way to construct a system whose likelihood is not zero!

Now, the main loop begins. Changes are proposed, and accepted with a Metropolis-Hastings acceptance probability: Going from state x to x', the probability is min(1,L(x')jm(x')g'm(u') / L(x)jm(x)gm(u)) where L is the likelihood function, jm(x) is the probability of choosing move m when at x, and gm(u) is the probability density of choosing the particular value u required to make the jump (ie, if going up in dimensionality, u represents all the new numbers needed, and when going down, g=1). To demonstrate, these are the jumps that are made when going through the loop. The first thing that happens is that for each possible bond, We propose one of four things, depending whether the bond is currently there or not. If the bond is there, with probability 1/2 we either:

  1. Change the time it was created. We have no change in dimensionality, so this is an easy step. The new time is chosen from a uniform distribution whose minimum and maximum are the least and greatest times the bond can possibly have been made - ie. if the infection times of the two sites are bounded by (min1,max1) and (min2,max2), the bond is created at between max(min1,min2) and min(max1,max2)+tau, where tau is the lifetime of the infection. Note that this distribution does not change, assuming tau is fixed (it is currently). The likelihood of the new state needs to take into account that the time of infection of each site must be recalculated (this is simply the minimum time of all bonds around the site). If this time is outside the bounds of that site, then the likelihood is zero. This proposal is then accepted with a probability given by min(1,ratio-of-likelihoods). Dimensionality does not change, and the distributions, being uniform and unchanging, cancel.
  2. Turn the bond off. In this case, the dimensionality is changing, which has to be taken into account. Once again, the likelihood is calculated before and after the change, taking into account that the new state might be impossible, but more needs to be taken into account in the acceptance ratio. gm(u) and g'm(u') are not identical, and do not cancel. g'm(u') is one - no new numbers need to be calculated in order to go down in dimensionality - but for the reverse jump, we need the probability of re-obtaining the time the bond currently is made at. Since we have a fixed uniform distribution, this is simply 1/delta where delta is the width of the distribution.

If the bond happens to be off at the current time, we again do one of two things, with probability 1/2.
  1. Do nothing. Not very exciting, but this is only so that the jm(x) terms cancel throughout
  2. Create the bond. This is much the same as for destroying the bond above. Note that this can never cause the system to become impossible, since it can only cause sites to become infected at a lower time, but never before the site's minimum possible time.

There are a couple of things that are worth mentioning. If the min/max times that sites are infected are a small interval compared with the lifetime of the infection, it's unlikely that the path of infection will change. One bond at least (the infecting bond) must always be within the min/max bounds, and hence to change which one that is, another needs to be put there and then the original's time changed. This is possibly the reason why I am seeing some bonds stuck on in some runs that aren't stuck on in others. Secondly, I'm pretty sure I do need the probability density in the acceptance ratio, but I am still open to discussion on that point!

Results

Here are some preliminary results:
This shows the trajectory of the alpha and beta parameters. The red dot shows the true parameters used for simulation. Note that this shows that alpha and beta are highly correlated, and therefore should be reparameterized in order to more efficiently explore parameter space. This has been done, as can be clearly seen by the lines on the diagram. Before converting back into alpha and beta, the diagram looks more like this:

The reparameterization done was rather arbitrary, and simply comes from measuring the previous diagram and making sure that the new axes were approximately parallel and perpendicular to the slope. In order to check convergence, alpha and beta have been plotted against time (for alpha and beta, the red line is the true value):
The last pic confirms that the two values are uncorrelated. All of these diagrams include the 'burn in' period, which seems remarkably short for this system, something which might be worrying, although I'm not sure.

Code

Probably more for interest's sake, here is some of the code used to produce these results. Firstly, the likelihood function:
(* The likelihood function. Given the known information (k) and the current state (s), calculate the log of the likelihood *)
(* of the bond i being in its current state *)
let get_log_prob k s i =
  try 
    let ti = s.s_sitimes.bonds.(i) in        (* ti is the time of infection of bond i *)
    let (n1,n2) = s.s_sitimes.bns.(i) in     (* n1,n2 are the indices of the sites that are connected by bond i *)  
    if n1= -1 || n2 = -1 
    then 0.0                                 (* if either site does not exist, bond is impossible *)
    else
      begin
	let a' = (getparam s "alpha'").value in   (* Get some parameters from the current state *)
	let tau = (getparam s "tau").value in
	let b' = (getparam s "beta'").value in

	let (a,b) = transform (a',b') in          (* True alpha and beta are a simple linear transformation of a', b' *)

	let (ns1,ns2) = s.s_sitimes.b1ns.(i) in   (* ns1 and ns2 are lists of pairs, (a,b), where a is the index of a site *)
	                                          (* attached to i, and b is the index of the bond attaching it *)

	(* get rid of those neighbours that aren't attached to the n1 or n2 *)
	let ns1 = List.filter (fun (_,i) -> s.s_sitimes.bonds.(i) >= 0.0) ns1 in 
	let ns2 = List.filter (fun (_,i) -> s.s_sitimes.bonds.(i) >= 0.0) ns2 in

	(* get the times of infection of our neighbours, and times of removal. Essentially, this is a mapping from *)
	(* (j,b) to the events that takes place there, the bond is created at some time, and the site (a neighbour of the *)
	(* sites we're interested in) dies at some other time *)

	let rec maplist1 l current =
	  match l with
	    (j,b)::rest -> maplist1 rest ((CreateBond1,s.s_sitimes.bonds.(b))::(ItoR1,s.s_sitimes.a.(j)+.tau)::current)
	  | _ -> current
	in 
	let rec maplist2 l cur =
	  match l with
	    (j,b)::rest -> maplist2 rest ((CreateBond2,s.s_sitimes.bonds.(b))::(ItoR2,s.s_sitimes.a.(j)+.tau)::cur)
	  | _ -> cur
	in
	let ns = maplist2 ns2 (maplist1 ns1 []) in 

   (* Add in two important times: the time of death of the two sites that neighbour the bond we're interested in *)
	let ns = if k.l_is_infected.(n1) then (Dead1,s.s_sitimes.a.(n1)+.tau)::ns else ns in
	let ns = if k.l_is_infected.(n2) then (Dead2,s.s_sitimes.a.(n2)+.tau)::ns else ns in
 
	(* Sort them into time order *)
	let ns =  List.sort (fun (_,t1) (_,t2) -> compare t1 t2) ns in

	(* Make sure we're not looking into the future! Note that we can't look into the future if the bond was never made! *)
	let ns =  if (ti > 0.0) then List.filter (fun (_,t) -> t<=ti) ns else ns in

	(* This if statement special cases a bond made to the initial infection site *)
	if List.length ns = 0 && (n1=k.l_initsite || n2=k.l_initsite) 
	then (log a) -. (ti *. a) 
	else
	  begin
	    (* transform the time-ordered event list into a list of triples, (rate,start,end). *)
	    (* This is the bit where the synergy comes in! *)
	    let ns2rate i1 curn1s i2 curn2s =
	      let r1 = if i1 then a +. (float_of_int curn1s) *. b else 0.0 in
	      let r2 = if i2 then a +. (float_of_int curn2s) *. b else 0.0 in
	      let r1 = if r1 < 0.0 then 0.0 else r1 in
	      let r2 = if r2 < 0.0 then 0.0 else r2 in
	      if i1 && i2 then ((r1 +. r2) /. 2.0) else (r1 +. r2)
	    in
	    let rec fixns l i1 curn1s i2 curn2s lastt =
	      match l with 
		[] -> if ti>0.0 then [(ns2rate i1 curn1s i2 curn2s,lastt,ti)] else []
	      | (CreateBond1,newt)::ls -> (ns2rate i1 curn1s i2 curn2s,lastt,newt)::(fixns ls true (succ curn1s) i2 curn2s newt)
	      | (CreateBond2,newt)::ls -> (ns2rate i1 curn1s i2 curn2s,lastt,newt)::(fixns ls i1 curn1s true (succ curn2s) newt)
	      | (ItoR1,newt)::ls -> (ns2rate i1 curn1s i2 curn2s,lastt,newt)::(fixns ls i1 (pred curn1s) i2 curn2s newt)
	      | (ItoR2,newt)::ls -> (ns2rate i1 curn1s i2 curn2s,lastt,newt)::(fixns ls i1 curn1s i2 (pred curn2s) newt)
	      | (_,newt)::ls -> ((if ti > 0.0 then (raise (Failure (Printf.sprintf "Fix data at site: %d!" i)))); [(ns2rate i1 curn1s i2 curn2s,lastt,newt)])
	    in
	    let firsttrans = (List.hd ns) in
	    let ns = match firsttrans with
	      (CreateBond1,t) -> fixns (List.tl ns) true 1 false 0 t
	    | (CreateBond2,t) -> fixns (List.tl ns) false 0 true 1 t in 

	    (* The log probability is the integral of the rate over time *) 
	    let logprob = List.fold_left (fun acc (r,t1,t2) ->
	      acc+. (t2-.t1)*.r) 0.0 ns in
 
       (* We might be interested in the final rate, depending whether the bond was made or not *)
	    let (lastr,_,_) = (List.hd (List.rev ns)) in

	    if ti>0.0
	    then log lastr -. logprob
	    else (-. logprob)
	  end
      end
  with
    (* An exception will occur under some conditions, in which case, the state is impossible. We shouldn't really get here! *)
    x -> if s.s_sitimes.bonds.(i)>0.0 then raise (Failure (Printf.sprintf "Fix data at bond %d! (ti=%f) exception=%s" i s.s_sitimes.bonds.(i) (Printexc.to_string x))) else 0.0 (* If there are no infectious neighbours *)

Now for the 'change bond time' function:
(* Change the time of a particular bond *)
let change_time k s i usefactor =
(* A bond can be made any time between the initial time a neighbouring site *)
(* was infected to the time that site dies *)
  let (n1,n2) = s.s_sitimes.bns.(i) in  
  if n1<> -1 && n2 <> -1 && k.l_is_infected.(n1) && k.l_is_infected.(n2) then  (* If the bond _can_ be made *)
    let (min1,max1) = k.l_infect_times.a.(n1) in
    let (min2,max2) = k.l_infect_times.a.(n2) in
    let min = Pervasives.max min1 min2 in
    let max = (pval s "tau") +. Pervasives.min max1 max2 in    (* min and max are the min and max times the bond can be made *)
    let range = max -. min in                                  (* 1/range is the probability density function *)
    let p = Random.float 1.0 in
    let newtime = min +. (max -. min) *. p in                  
    let oldtime = s.s_sitimes.bonds.(i) in
    let p2 = 0.5 in 
    let newtime = if Random.float 1.0 > p2 then newtime else -1.0 in  (* p2 is probability of proposing a time, (1-p2) is     *)
                                                                      (* probability of proposing the bond is not made       *)
                                                                      (* this is regardless of the current state of the bond *)

    (* This factor was missing before - it is the ratio of the probability densities for the RJMCMC metropolis-hastings *)
    (* acceptance ratio *)
    let factor = 
      if oldtime < 0.0 && newtime < 0.0      (* staying in same state-space *)
      then 1.0
      else if oldtime < 0.0 && newtime > 0.0 (* moving > in dimensionality *)
      then range
      else if oldtime > 0.0 && newtime < 0.0 (* moving < in dimensionality *)
      then 1.0 /. range 
      else 1.0                               (* staying in same dimensionality *)
    in
    try
      s.s_sitimes.bonds.(i) <- newtime;                  (* Set the new time in place *)
      let (before,after) = find_l_change_local k s i in  (* Calculate the change in the *local* likelihood *)
      let delta = exp (after -. before) in               (* Ratio of the likelihoods *)
    
      (* pacc is the acceptance probability. The 'Do, Dont and DoOneOver' bits were because I wasn't sure that the 'factor' *)
      (* term was correct!! *)
      let pacc = Pervasives.min 1.0 (delta  *. (match usefactor with Dont -> 1.0 | Do -> factor | DoOneOver -> (1.0 /. factor))) in
      if Random.float 1.0 > pacc 
             (* change was not accepted - replace the old time *)
      then (s.s_sitimes.bonds.(i) <- oldtime; fix_sites k s n1 n2; (false,Some pacc, change_type newtime oldtime)) 
             (* change was accepted! recache the changed loglikelihood *)
      else (change_loglikelihood k s i; (true,Some pacc, change_type newtime oldtime))
    with
      e -> 
	s.s_sitimes.bonds.(i) <- oldtime; (* any exception, and we restore the old time *)
	fix_sites k s n1 n2;
	(false,None,change_type newtime oldtime)
  else (false,None,NoChange)