I haven’t had time to say much recently, due to some travel for work, but I did have a chance to prototype the Hit-and-Run/Adaptive Metropolis approach to MCMC that I mentioned in my last post. Was that really three weeks ago? How time flies.

Anyway, thanks to the tip Aram pointed out in the comments, Hit-and-Run can take steps in the correct direction without explicitly computing an approximation of the covariance matrix, just by taking a randomly weighted sum of random points. It’s nearly magic, although Aram says that it actually makes plenty of sense. The code for this “H-RAM” looks pretty similar to my original Hit-and-Run step method, and it’s short enough that I’ll just show it to you:

class HRAM(Gibbs):
def __init__(self, stochastic, proposal_sd=None, verbose=None):
Metropolis.__init__(self, stochastic, proposal_sd=proposal_sd,
verbose=verbose, tally=False)
self.proposal_tau = self.proposal_sd**-2.
self.n = 0
self.N = 11
self.value = rnormal(self.stochastic.value, self.proposal_tau, size=tuple([self.N] + list(self.stochastic.value.shape)))
def step(self):
x0 = self.value[self.n]
u = rnormal(zeros(self.N), 1.)
dx = dot(u, self.value)
self.stochastic.value = x0
logp = [self.logp_plus_loglike]
x_prime = [x0]
for direction in [-1, 1]:
for i in xrange(25):
delta = direction*exp(.1*i)*dx
try:
self.stochastic.value = x0 + delta
logp.append(self.logp_plus_loglike)
x_prime.append(x0 + delta)
except ZeroProbability:
self.stochastic.value = x0
i = rcategorical(exp(array(logp) - flib.logsum(logp)))
self.value[self.n] = x_prime[i]
self.stochastic.value = x_prime[i]
if i == 0:
self.rejected += 1
else:
self.accepted += 1
self.n += 1
if self.n == self.N:
self.n = 0

Compare the results to the plain old Hit-and-Run step method:

### Like this:

Like Loading...

*Related*

Filed under MCMC

Tagged as hit-and-run, MCMC, pymc, python

Abie,

Great stuff! Two questions. First, how hard is it to show that H-RAM samples from the target distribution? Second, could the same trick for avoiding explicit covariance computation work for PyMC’s standard Adaptive Metropolis?

Anand

Hi Anand, have a look at the section that Aram pointed me to in his comment here. In short, it seems tremendously simple to show that the stationary distribution is as intended for the H-RAM, but when I tried my hand at extending it to the “standard” Adaptive Metropolis, I got totally confused.

It seems like it should work for AM, though. Pretty cool if it does, huh?

Hehe, section 5.1 is vastly less intimidating than most of the paper… I guess I didn’t need to procrastinate. Do you buy this as an argument for its validity with the standard (wrt PyMC) AM:

Say you’re trying to sample variable . If you solve the problem of sampling a length-m iid vector, , each of whose elements is distributed identically to , you slice your joint samples to get the samples you want.

Think of the history in section 5.1 as current values of . If you update the i’th entry by a Metropolis step with a Gaussian proposal, you can choose the proposal covariance as in section 5.1 without violating the Markov property. So you still have a valid Metropolis updating scheme.

In line 21, since you’re spacing your trial points nonuniformly, do you need to weight the logp’s of the x_primes by the Jacobian of the trial point ‘distribution’?

Also, is it possible to replace the grid search along the line in lines 19-31 with a Metropolis step along the same line, to save log-probability evaluations?

Not the Jacobian, just the density. Got confused there.

Hi Anand,

For your argument adapting Bertsimas and Vempala’s section 5.1, I agree that this is a Markovian update, but how do you know that choosing this strange covariance matrix still yields the intended stationary distribution?

I think you’re right about the need for a correction term due to my sampling. The theoretical version that chooses from the line is so simple. I was trying to come up with an approximation that only evaluated a limited number of points.

Replacing the grid search with a Metropolis step would make this basically AM, right? There is supposed to be a benefit to the Hit-and-Run part, at least theoretically, where it can get out of a tight corner that Metropolis can’t. Maybe taking a bunch of Metropolis steps along the same one-dimensional subspace would have all of the benefits, but no need for complicated weighting.

There is the additional benefit of the grid search in my cross-diagonal example, where the walk can tunnel through a low-probability region to get to another high-probability part. But I wouldn’t count on this working well for distributions with many modes in higher dimensions.

Sounds like a lot like Differential Evolution samplers:

Provides the basis for the DREAM_ZS extension (also see second paper).

C.J.F. ter Braak, and J.A. Vrugt, Differential evolution Markov chain with

snooker updater and fewer chains, Statistics and Computing, 18(4),

435-446, doi:10.1007/s11222-008-9104-9, 2008.

Introduces the origional DREAM idea:

J.A. Vrugt, C.J.F. ter Braak, C.G.H. Diks, D. Higdon, B.A. Robinson, and

J.M. Hyman, Accelerating Markov chain Monte Carlo simulation by

differential evolution with self-adaptive randomized subspace sampling,

International Journal of Nonlinear Sciences and Numerical

Simulation, 10(3), 273-290, 2009.

Thanks for the refs, @jsalvatier. I hope to have time to take a look soon. Does “differential” mean that they rely on the derivative of the likelihood (or posterior or something)?

Pingback: MCMC in Python: Part II of PyMC Step Methods and their pitfalls | Healthy Algorithms