# Introducing the H-RAM

I haven’t had time to say much recently, due to some travel for work, but I did have a chance to prototype the Hit-and-Run/Adaptive Metropolis approach to MCMC that I mentioned in my last post.  Was that really three weeks ago?  How time flies.

Anyway, thanks to the tip Aram pointed out in the comments, Hit-and-Run can take steps in the correct direction without explicitly computing an approximation of the covariance matrix, just by taking a randomly weighted sum of random points.  It’s nearly magic, although Aram says that it actually makes plenty of sense. The code for this “H-RAM” looks pretty similar to my original Hit-and-Run step method, and it’s short enough that I’ll just show it to you:

```class HRAM(Gibbs):
def __init__(self, stochastic, proposal_sd=None, verbose=None):
Metropolis.__init__(self, stochastic, proposal_sd=proposal_sd,
verbose=verbose, tally=False)
self.proposal_tau = self.proposal_sd**-2.
self.n = 0
self.N = 11
self.value = rnormal(self.stochastic.value, self.proposal_tau, size=tuple([self.N] + list(self.stochastic.value.shape)))

def step(self):
x0 = self.value[self.n]
u = rnormal(zeros(self.N), 1.)
dx = dot(u, self.value)

self.stochastic.value = x0
logp = [self.logp_plus_loglike]
x_prime = [x0]

for direction in [-1, 1]:
for i in xrange(25):
delta = direction*exp(.1*i)*dx
try:
self.stochastic.value = x0 + delta
logp.append(self.logp_plus_loglike)
x_prime.append(x0 + delta)
except ZeroProbability:
self.stochastic.value = x0

i = rcategorical(exp(array(logp) - flib.logsum(logp)))
self.value[self.n] = x_prime[i]
self.stochastic.value = x_prime[i]

if i == 0:
self.rejected += 1
else:
self.accepted += 1

self.n += 1
if self.n == self.N:
self.n = 0
```

Compare the results to the plain old Hit-and-Run step method:

Filed under MCMC

### 9 responses to “Introducing the H-RAM”

1. Anand Patil

Abie,

Great stuff! Two questions. First, how hard is it to show that H-RAM samples from the target distribution? Second, could the same trick for avoiding explicit covariance computation work for PyMC’s standard Adaptive Metropolis?

Anand

2. Hi Anand, have a look at the section that Aram pointed me to in his comment here. In short, it seems tremendously simple to show that the stationary distribution is as intended for the H-RAM, but when I tried my hand at extending it to the “standard” Adaptive Metropolis, I got totally confused.

It seems like it should work for AM, though. Pretty cool if it does, huh?

3. Anand Patil

Hehe, section 5.1 is vastly less intimidating than most of the paper… I guess I didn’t need to procrastinate. Do you buy this as an argument for its validity with the standard (wrt PyMC) AM:

Say you’re trying to sample variable $x$. If you solve the problem of sampling a length-m iid vector, $[x_1, \ldots, x_m]$, each of whose elements is distributed identically to $x$, you slice your joint samples to get the samples you want.

Think of the history $[v_1, ..., v_m]$ in section 5.1 as current values of $[x_1, \ldots, x_m]$. If you update the i’th entry by a Metropolis step with a Gaussian proposal, you can choose the proposal covariance as in section 5.1 without violating the Markov property. So you still have a valid Metropolis updating scheme.

4. Anand Patil

In line 21, since you’re spacing your trial points nonuniformly, do you need to weight the logp’s of the x_primes by the Jacobian of the trial point ‘distribution’?

Also, is it possible to replace the grid search along the line in lines 19-31 with a Metropolis step along the same line, to save log-probability evaluations?

5. Anand Patil

Not the Jacobian, just the density. Got confused there.

6. Hi Anand,

For your argument adapting Bertsimas and Vempala’s section 5.1, I agree that this is a Markovian update, but how do you know that choosing this strange covariance matrix still yields the intended stationary distribution?

I think you’re right about the need for a correction term due to my sampling. The theoretical version that chooses from the line is so simple. I was trying to come up with an approximation that only evaluated a limited number of points.

Replacing the grid search with a Metropolis step would make this basically AM, right? There is supposed to be a benefit to the Hit-and-Run part, at least theoretically, where it can get out of a tight corner that Metropolis can’t. Maybe taking a bunch of Metropolis steps along the same one-dimensional subspace would have all of the benefits, but no need for complicated weighting.

There is the additional benefit of the grid search in my cross-diagonal example, where the walk can tunnel through a low-probability region to get to another high-probability part. But I wouldn’t count on this working well for distributions with many modes in higher dimensions.

7. Sounds like a lot like Differential Evolution samplers:

Provides the basis for the DREAM_ZS extension (also see second paper).
C.J.F. ter Braak, and J.A. Vrugt, Differential evolution Markov chain with
snooker updater and fewer chains, Statistics and Computing, 18(4),
435-446, doi:10.1007/s11222-008-9104-9, 2008.

Introduces the origional DREAM idea:
J.A. Vrugt, C.J.F. ter Braak, C.G.H. Diks, D. Higdon, B.A. Robinson, and
J.M. Hyman, Accelerating Markov chain Monte Carlo simulation by
differential evolution with self-adaptive randomized subspace sampling,
International Journal of Nonlinear Sciences and Numerical
Simulation, 10(3), 273-290, 2009.

8. Thanks for the refs, @jsalvatier. I hope to have time to take a look soon. Does “differential” mean that they rely on the derivative of the likelihood (or posterior or something)?