Sci.MCMC: MCMC Algorithms

This module implements Markov Chain Monte Carlo (MCMC) algorithms which are used to sample from a target density. All samplers operates on log-densities instead of densities both for efficiency and numerical stability reasons.

API

theta1, eps1, invmass1 = mcmc.nuts(rng, f, theta0, options)

Implements the No-U-Turn Sampler (NUTS) of Hoffman&Gelman[13]. This is an adaptive MCMC algorithm which belongs to the class of Hamiltonian Monte Carlo (HMC) samplers. As such the gradient of the log-density is required to carry out the simulation. The sci.diff module allows for the gradient to be automatically computed without approximations with minimal user effort.

The algorithm is implemented according to the paper mentioned above but additionally a diagonal (or optionally dense) mass matrix is estimated during the adaptation phase to improve the performance of the sampler. Similar improvements have been implemented in the Stan Project implementation of NUTS.

The argument rng is a prng object, the function f(x, grad) sets grad to the gradient of the log-density at x and returns the value of the log-density at x. Both x and grad must have the same length of theta0 which is the starting value of the Markov Chain (adaptation phase). The log-density must evaluate to a finite value at theta0. The argument options is a table and the algorithm makes use of the following string keys (defaults in square brackets):

Key Description
stopadapt [1024] number of adaptation samples, 0 or power of 2 >= 64
stop number of samples or function(theta)
olstat on-line statistical estimators
mass ["diagonal"] can be "diagonal" or "dense"
eps [rough guess] initial value for eps
cov [nil] initial covariance: densemass^-1 (matrix)
var [nil] initial variance: diagonalmass^-1 (vector)
delta [0.8] for adaptation algorithm, see NUTS paper
gamma [0.05] for adaptation algorithm, see NUTS paper
t0 [10] for adaptation algorithm, see NUTS paper
k [0.75] for adaptation algorithm, see NUTS paper
dlmax [1000] to cap tree growth, see deltamax in NUTS paper

The Markov Chain is started at theta0 and the adaptation takes place for the first stopadapt samples (stopadadapt may be 0). Then the parameters determining the evolution of the Markov Chain are fixed and the sampling phase follows. If stop is a number then stop samples are drawn from the chain. If stop is a function, each drawn sample is passed to stop and the algorithm terminates when stop evaluates to true. In any case olstat:push(theta) is called for every drawn sample theta (samples from the adaptation phase are excluded). Using the on-line statistical accumulators of sci.stat allows to estimate statistics of the target distribution. When the algorithm terminates it returns: the last drawn sample, the adapted eps and the adapted inverse mass (variance vector if the mass is diagonal or covariance matrix if the mass is dense). Notice that this information allows to re-start the chain if so desired by passing these returned values in the options argument and setting stopadapt to 0.

The mass option determines whether the algorithm uses a diagonal or dense mass matrix for the Hamiltonian dynamics. The eps option allows to provide an initial guess for the discretization step (otherwise a reasonable value is automatically determined). The options cov and var allows to specify an initial estimates for the mass matrix which is otherwise set equal to the identity matrix. We refer to the NUTS paper for the remaining options, the default values are perfectly adequate in most cases.

This example illustrates an use of mcmc.nuts() to sample from a simple bivariate density:


jit.opt.start("loopunroll=60") -- Set optimization flags:

-- Load required modules:
local math = require "sci.math".generic
local diff = require "sci.diff"
local alg  = require "sci.alg"
local prng = require "sci.prng"
local stat = require "sci.stat"
local mcmc = require "sci.mcmc"

local rng     = prng.std()
local N       = 50000                -- 50000 samples (after adaptation).
local theta0  = alg.tovec{ 0.5,0.5 } -- Starting value of chain.
local mystat  = stat.olcov(2)        -- Compute mean and covariance.

-- Logarithm of target density:
local function logpdf(x)
  return 2*math.log(x[1]) - x[1]*x[2]^2 - x[2]^2 + 2*x[2] - 4*x[1]
end
-- Automatic gradient computation (vector of 2 elements):
local dlogpdf = diff.gradientf(logpdf, 2)

local start = os.clock() -- To time the simulation.
mcmc.nuts(rng, dlogpdf, theta0, {
  stop   = N,
  olstat = mystat,
})
local mu, sig = alg.vec(2), alg.mat(2, 2)
mystat:mean(mu) -- Compute mean of target density.
mystat:cov(sig) -- Compute covariance of target density.
local time = os.clock() - start

print("mean:")
print(mu:width())
print("covariance:")
print(sig:width())
--> mean:
--> +0.650281,+0.637480
--> covariance:
--> +0.159994,-0.052265
--> -0.052265,+0.338427

print("CPU secs:", time) --> 1/4 of a second here with LuaJIT 2.1.