## accelerated ABC

**R**ichard Wilkinson arXived a paper on accelerated ABC during MCMSki 4, paper that I almost missed when quickly perusing the daily list. This is another illustration of the “invasion of Gaussian processes” in ABC settings. Maybe under the influence of machine learning.

**T**he paper starts with a link to the synthetic likelihood approximation of Wood (2010, Nature), as in Richard Everitt’s talk last week. Richard (W.) presents the generalised ABC as a kernel-based acceptance probability, using a kernel π(* y*|

**), when**

*x**is the observed data and*

**y****=**

*x***(θ) the simulated one. He proposes a Gaussian process modelling for the log-likelihood (at the observed data**

*x**), with a quadratic (in θ) mean and Matérn covariance matrix. Hence the connection with Wood’s synthetic likelihood. Another connection is with Nicolas’ talk on QMC(MC): the θ’s are chosen following a Sobol sequence “in order to minimize the number of design points”. Which requires a reparameterisation to [0,1]*

**y**^{p}… I find this “uniform” exploration of the whole parameter space delicate to envision in complex parameter spaces and realistic problems, since the likelihood is highly concentrated on a tiny subregion of the original [0,1]

^{p}. Not mentioning the issue of the spurious mass on the boundaries of the hypercube possibly induced by the change of variable. The sequential algorithm of Richard also attempts at eliminating implausible zones of the parameter space. i.e. zones where the likelihood is essentially zero. My worries with this interesting notion are that (a) the early Gaussian process approximations may be poor and hence exclude zones they should not; (b) all Gaussian process approximations at all iterations must be saved; (c) the Sobol sequences apply to the whole [0,1]

^{p}at each iteration but the non-implausible region shrinks at each iteration, which induces a growing inefficiency in the algorithm. The Sobol sequence should be restricted to the previous non-implausible zone.

**O**verall, an interesting proposal that would need more prodding to understand whether or not it is robust to poor initialisation and complex structures. And a proposal belonging to the estimated likelihood branch of ABC, which makes use of the final Gaussian process approximation to run an MCM algorithm. Without returning to pseudo-data simulation, replacing it with log-likelihood simulation.

“These algorithms sample space randomly and naively and do not learn from previous simulations”

**T**he above criticism is moderated in a footnote about ABC-SMC using the “current parameter value to determine which move to make next [but] parameters visited in previous iterations are not taken into account”. I still find it excessive in that SMC algorithms and in particular ABC-SMC algorithms are completely free to use the whole past to build the new proposal. This was clearly enunciated in our earlier population Monte Carlo papers. For instance, the complete collection of past particles can be recycled by weights computing thru our AMIS algorithm, as illustrated by Jukka Corander in one genetics application.

February 13, 2015 at 2:49 pm

I’ve only just read the Wilkinson GP-ABC paper and my impression is that unfortunately he presents only ‘strawman’ examples in support of his proposal, so at this stage it’s difficult to decide whether it would be worth an applied researcher like myself investing time in that specific direction.

Like you, I don’t believe it’s fair that he lumps SMC-ABC methods into his criticism of MCMC-ABC methods: “(i) They sample space randomly and only learn from previous simulations in the limited sense of using the current parameter value to determine which move to make next; (ii) They do not exploit known properties of the likelihood function, such as continuity or smoothness.”

– In fact, SMC methods will adapt their refreshment kernel using a smooth parametric form which (i) learns from at least the current set of live particles, but can go back further (e.g. AMIS-like strategies) and (ii) utilises our rule-of-thumb expectation of mostly small variation in the posterior density on small scales without the required assumption of continuity/smoothness (for which the GABC kernel is introduced in Wilkinson’s paper).

It also seems to me to be silly to compare history matching GP ABC with either Wood’s synthetic likelihood (random walk MCMC at core) or rejection ABC (prior sampling), as in the two worked examples. A fairer comparison would be to something like the ABC equivalent of Raftery’s Incremental Mixture Importance Sampling (e.g. http://www.malariajournal.com/content/14/1/6 for an application) which also uses successive waves of sampling within bounded plausibility regions.

January 18, 2014 at 2:35 pm

Using a minimal number of simulations is also what drew me to these methods (I have also been working on something similar to the two papers mentioned here and hope to put something out about it soon). I think that some of the appeal of things like synthetic likelihood is that they exploit some kind of parametric assumptions to help out with this – as opposed to ABC which is more like a kernel density estimator. One can imagine lots of extensions, and also maybe some useful methods that are tailored to specific applications.

January 18, 2014 at 10:55 am

Thanks for this discussion and reference to our recent arXiv submission. I also agree that some details of these methods can be improved and a GP sometimes needs more supervision than one would like, but instead of focussing on these details I want to emphasize the importance of the problem that we are trying to address. I have been working with a number of biologists who have built sophisticated simulators but have no idea how to optimize the parameters. They can get really hung up on this issue. For instance, if their simulations do not predict the observations well, is this because they have not yet found the optimal parameter setting or is this because their model is wrong? Reversely, if their model does describe the data well, is this because this is a realistic model of nature or is this because they overfit? This issue is particularly acute because every simulation can be quite expensive to run (as an extreme example, earthquake simulations such as the SCEC Shakeout simulation sometimes take a day on a supercomputer with 200K cores at 200Tflops/sec to finish).

The recent papers by Rich and Ted and me are a first attempt to address these difficult issues by developing ABC methods that require a minimal number of simulations. I agree that certain details need and will be improved over time, but I sometimes miss the big picture view: let’s first decide that this is a very important problem to work on and let’s acknowledge that these are the first baby steps in that direction with many details to be improved in followup work. I hope others will join us in our enthusiasm to work on these hard but important problems.

January 15, 2014 at 9:45 pm

Thanks for the coverage Christian and for the suggested links – I’ll look into these. There is also the paper by Dahlin and Lindsten in which they use GPs to speed up parameter estimation in SSMs arxiv.org/abs/1311.0689 and Richard E. has already pointed out that Ted Meeds and Max Welling posted a very similar paper to mine yesterday. Their approach differs in that they model the first two moments of the summary of the simulator output, whereas I model the log likelihood function. I think both approaches have their merits. The log-likelihood function is harder to model (hence the need for the waves of history matching), but is 1d and avoids us having to make any parametric assumptions about the form of the likelihood (they use Simon Wood’s approach and assume a Gaussian synthetic likelihood).

I agree that the approach needs more “prodding” – at present it works with heavy supervision. I don’t necessarily think this is a problem, as GP emulation always seems to need careful supervision, and this hasn’t stopped it becoming a useful tool. However, it would be nice if the automation could be trusted.

The motivation for the work wasn’t from ML btw, but from trying to extend O’Hagan (and others) work on calibration of expensive deterministic functions to expensive stochastic simulators. If your simulator is cheap enough to be able to do standard ABC, then obviously that is a better choice. I think Ted, Max and I all have in mind that these GP-ABC methods will hopefully be useful in expensive models where we don’t have the luxury of being able to do a lot of simulation.

January 14, 2014 at 11:30 pm

Another related paper to this appeared today http://arxiv.org/abs/1401.2838.