I haven’t had time to say much recently, due to some travel for work, but I did have a chance to prototype the Hit-and-Run/Adaptive Metropolis approach to MCMC that I mentioned in my last post. Was that really three weeks ago? How time flies.
Anyway, thanks to the tip Aram pointed out in the comments, Hit-and-Run can take steps in the correct direction without explicitly computing an approximation of the covariance matrix, just by taking a randomly weighted sum of random points. It’s nearly magic, although Aram says that it actually makes plenty of sense. The code for this “H-RAM” looks pretty similar to my original Hit-and-Run step method, and it’s short enough that I’ll just show it to you:
class HRAM(Gibbs): def __init__(self, stochastic, proposal_sd=None, verbose=None): Metropolis.__init__(self, stochastic, proposal_sd=proposal_sd, verbose=verbose, tally=False) self.proposal_tau = self.proposal_sd**-2. self.n = 0 self.N = 11 self.value = rnormal(self.stochastic.value, self.proposal_tau, size=tuple([self.N] + list(self.stochastic.value.shape))) def step(self): x0 = self.value[self.n] u = rnormal(zeros(self.N), 1.) dx = dot(u, self.value) self.stochastic.value = x0 logp = [self.logp_plus_loglike] x_prime = [x0] for direction in [-1, 1]: for i in xrange(25): delta = direction*exp(.1*i)*dx try: self.stochastic.value = x0 + delta logp.append(self.logp_plus_loglike) x_prime.append(x0 + delta) except ZeroProbability: self.stochastic.value = x0 i = rcategorical(exp(array(logp) - flib.logsum(logp))) self.value[self.n] = x_prime[i] self.stochastic.value = x_prime[i] if i == 0: self.rejected += 1 else: self.accepted += 1 self.n += 1 if self.n == self.N: self.n = 0
Compare the results to the plain old Hit-and-Run step method:
Abie,
Great stuff! Two questions. First, how hard is it to show that H-RAM samples from the target distribution? Second, could the same trick for avoiding explicit covariance computation work for PyMC’s standard Adaptive Metropolis?
Anand
Hi Anand, have a look at the section that Aram pointed me to in his comment here. In short, it seems tremendously simple to show that the stationary distribution is as intended for the H-RAM, but when I tried my hand at extending it to the “standard” Adaptive Metropolis, I got totally confused.
It seems like it should work for AM, though. Pretty cool if it does, huh?
Hehe, section 5.1 is vastly less intimidating than most of the paper… I guess I didn’t need to procrastinate. Do you buy this as an argument for its validity with the standard (wrt PyMC) AM:
Say you’re trying to sample variable
. If you solve the problem of sampling a length-m iid vector,
, each of whose elements is distributed identically to
, you slice your joint samples to get the samples you want.
Think of the history
in section 5.1 as current values of
. If you update the i’th entry by a Metropolis step with a Gaussian proposal, you can choose the proposal covariance as in section 5.1 without violating the Markov property. So you still have a valid Metropolis updating scheme.
In line 21, since you’re spacing your trial points nonuniformly, do you need to weight the logp’s of the x_primes by the Jacobian of the trial point ‘distribution’?
Also, is it possible to replace the grid search along the line in lines 19-31 with a Metropolis step along the same line, to save log-probability evaluations?
Not the Jacobian, just the density. Got confused there.
Hi Anand,
For your argument adapting Bertsimas and Vempala’s section 5.1, I agree that this is a Markovian update, but how do you know that choosing this strange covariance matrix still yields the intended stationary distribution?
I think you’re right about the need for a correction term due to my sampling. The theoretical version that chooses from the line is so simple. I was trying to come up with an approximation that only evaluated a limited number of points.
Replacing the grid search with a Metropolis step would make this basically AM, right? There is supposed to be a benefit to the Hit-and-Run part, at least theoretically, where it can get out of a tight corner that Metropolis can’t. Maybe taking a bunch of Metropolis steps along the same one-dimensional subspace would have all of the benefits, but no need for complicated weighting.
There is the additional benefit of the grid search in my cross-diagonal example, where the walk can tunnel through a low-probability region to get to another high-probability part. But I wouldn’t count on this working well for distributions with many modes in higher dimensions.
Sounds like a lot like Differential Evolution samplers:
Provides the basis for the DREAM_ZS extension (also see second paper).
C.J.F. ter Braak, and J.A. Vrugt, Differential evolution Markov chain with
snooker updater and fewer chains, Statistics and Computing, 18(4),
435-446, doi:10.1007/s11222-008-9104-9, 2008.
Introduces the origional DREAM idea:
J.A. Vrugt, C.J.F. ter Braak, C.G.H. Diks, D. Higdon, B.A. Robinson, and
J.M. Hyman, Accelerating Markov chain Monte Carlo simulation by
differential evolution with self-adaptive randomized subspace sampling,
International Journal of Nonlinear Sciences and Numerical
Simulation, 10(3), 273-290, 2009.
Thanks for the refs, @jsalvatier. I hope to have time to take a look soon. Does “differential” mean that they rely on the derivative of the likelihood (or posterior or something)?
Pingback: MCMC in Python: Part II of PyMC Step Methods and their pitfalls | Healthy Algorithms