Introducing the H-RAM

I haven’t had time to say much recently, due to some travel for work, but I did have a chance to prototype the Hit-and-Run/Adaptive Metropolis approach to MCMC that I mentioned in my last post. Was that really three weeks ago? How time flies.

Anyway, thanks to the tip Aram pointed out in the comments, Hit-and-Run can take steps in the correct direction without explicitly computing an approximation of the covariance matrix, just by taking a randomly weighted sum of random points. It’s nearly magic, although Aram says that it actually makes plenty of sense. The code for this “H-RAM” looks pretty similar to my original Hit-and-Run step method, and it’s short enough that I’ll just show it to you:

class HRAM(Gibbs):
    def __init__(self, stochastic, proposal_sd=None, verbose=None):
        Metropolis.__init__(self, stochastic, proposal_sd=proposal_sd,
                            verbose=verbose, tally=False)
        self.proposal_tau = self.proposal_sd**-2.
        self.n = 0
        self.N = 11
        self.value = rnormal(self.stochastic.value, self.proposal_tau, size=tuple([self.N] + list(self.stochastic.value.shape)))

    def step(self):
        x0 = self.value[self.n]
        u = rnormal(zeros(self.N), 1.)
        dx = dot(u, self.value)

        self.stochastic.value = x0
        logp = [self.logp_plus_loglike]
        x_prime = [x0]

        for direction in [-1, 1]:
            for i in xrange(25):
                delta = direction*exp(.1*i)*dx
                try:
                    self.stochastic.value = x0 + delta
                    logp.append(self.logp_plus_loglike)
                    x_prime.append(x0 + delta)
                except ZeroProbability:
                    self.stochastic.value = x0

        i = rcategorical(exp(array(logp) - flib.logsum(logp)))
        self.value[self.n] = x_prime[i]
        self.stochastic.value = x_prime[i]

        if i == 0:
            self.rejected += 1
        else:
            self.accepted += 1

        self.n += 1
        if self.n == self.N:
            self.n = 0

Compare the results to the plain old Hit-and-Run step method:

9 responses to “Introducing the H-RAM”

Anand Patil

February 17, 2011 at 2:36 pm

Abie,

Great stuff! Two questions. First, how hard is it to show that H-RAM samples from the target distribution? Second, could the same trick for avoiding explicit covariance computation work for PyMC’s standard Adaptive Metropolis?

Anand
Abraham Flaxman

February 17, 2011 at 11:47 pm

Hi Anand, have a look at the section that Aram pointed me to in his comment here. In short, it seems tremendously simple to show that the stationary distribution is as intended for the H-RAM, but when I tried my hand at extending it to the “standard” Adaptive Metropolis, I got totally confused.

It seems like it should work for AM, though. Pretty cool if it does, huh?
Anand Patil

February 23, 2011 at 5:06 pm

Hehe, section 5.1 is vastly less intimidating than most of the paper… I guess I didn’t need to procrastinate. Do you buy this as an argument for its validity with the standard (wrt PyMC) AM:

Say you’re trying to sample variable $x$ . If you solve the problem of sampling a length-m iid vector, $[x_1, \ldots, x_m]$ , each of whose elements is distributed identically to $x$ , you slice your joint samples to get the samples you want.

Think of the history $[v_1, ..., v_m]$ in section 5.1 as current values of $[x_1, \ldots, x_m]$ . If you update the i’th entry by a Metropolis step with a Gaussian proposal, you can choose the proposal covariance as in section 5.1 without violating the Markov property. So you still have a valid Metropolis updating scheme.
Anand Patil

February 23, 2011 at 5:15 pm

In line 21, since you’re spacing your trial points nonuniformly, do you need to weight the logp’s of the x_primes by the Jacobian of the trial point ‘distribution’?

Also, is it possible to replace the grid search along the line in lines 19-31 with a Metropolis step along the same line, to save log-probability evaluations?
Anand Patil

February 24, 2011 at 12:26 pm

Not the Jacobian, just the density. Got confused there.
Abraham Flaxman

February 24, 2011 at 6:57 pm

Hi Anand,

For your argument adapting Bertsimas and Vempala’s section 5.1, I agree that this is a Markovian update, but how do you know that choosing this strange covariance matrix still yields the intended stationary distribution?

I think you’re right about the need for a correction term due to my sampling. The theoretical version that chooses from the line is so simple. I was trying to come up with an approximation that only evaluated a limited number of points.

Replacing the grid search with a Metropolis step would make this basically AM, right? There is supposed to be a benefit to the Hit-and-Run part, at least theoretically, where it can get out of a tight corner that Metropolis can’t. Maybe taking a bunch of Metropolis steps along the same one-dimensional subspace would have all of the benefits, but no need for complicated weighting.

There is the additional benefit of the grid search in my cross-diagonal example, where the walk can tunnel through a low-probability region to get to another high-probability part. But I wouldn’t count on this working well for distributions with many modes in higher dimensions.
jsalvatier

March 4, 2011 at 5:12 pm

Sounds like a lot like Differential Evolution samplers:

Provides the basis for the DREAM_ZS extension (also see second paper).
C.J.F. ter Braak, and J.A. Vrugt, Differential evolution Markov chain with
snooker updater and fewer chains, Statistics and Computing, 18(4),
435-446, doi:10.1007/s11222-008-9104-9, 2008.

Introduces the origional DREAM idea:
J.A. Vrugt, C.J.F. ter Braak, C.G.H. Diks, D. Higdon, B.A. Robinson, and
J.M. Hyman, Accelerating Markov chain Monte Carlo simulation by
differential evolution with self-adaptive randomized subspace sampling,
International Journal of Nonlinear Sciences and Numerical
Simulation, 10(3), 273-290, 2009.
Abraham Flaxman

March 9, 2011 at 5:50 pm

Thanks for the refs, @jsalvatier. I hope to have time to take a look soon. Does “differential” mean that they rely on the derivative of the likelihood (or posterior or something)?
Pingback: MCMC in Python: Part II of PyMC Step Methods and their pitfalls | Healthy Algorithms

Introducing the H-RAM

9 responses to “Introducing the H-RAM”

Posts

Theory Blogs

some rights reserved

Pages

Archives

Meta

Introducing the H-RAM

Share this:

Related

9 responses to “Introducing the H-RAM”

Posts

Theory Blogs

some rights reserved

Pages

Archives

Meta