Quasi restricted maximum likelihood (qREML) algorithm for models with penalised splines or simple i.i.d. random effects
qreml.Rd
This algorithm can be used very flexibly to fit statistical models that involve penalised splines or simple i.i.d. random effects, i.e. that have penalties of the form $$0.5 \sum_{i} \lambda_i b_i^T S_i b_i,$$ with smoothing parameters \(\lambda_i\), coefficient vectors \(b_i\), and fixed penalty matrices \(S_i\).
The qREML algorithm is typically much faster than REML or marginal ML using the full Laplace approximation method, but may be slightly less accurate regarding the estimation of the penalty strength parameters.
Under the hood, qreml
uses the R package RTMB
for automatic differentiation in the inner optimisation.
The user has to specify the penalised negative log-likelihood function pnll
structured as dictated by RTMB
and use the penalty
function to compute the quadratic-form penalty inside the likelihood.
Arguments
- pnll
penalised negative log-likelihood function that is structured as dictated by
RTMB
and uses thepenalty
function fromLaMa
to compute the penaltyNeeds to be a function of the named list of initial parameters
par
only.- par
named list of initial parameters
The random effects/ spline coefficients can be vectors or matrices, the latter summarising several random effects of the same structure, each one being a row in the matrix.
- dat
initial data list that contains the data used in the likelihood function, hyperparameters, and the initial penalty strength vector
If the initial penalty strength vector is not called
lambda
, the name it has indat
needs to be specified using thepenalty
argument below. Its length needs to match the to the total number of random effects.- random
vector of names of the random effects/ penalised parameters in
par
Caution: The ordering of
random
needs to match the order of the random effects passed topenalty
inside the likelihood function.- psname
optional name given to the penalty strength parameter in
dat
. Defaults to"lambda"
.- alpha
optional hyperparamater for exponential smoothing of the penalty strengths.
For larger values smoother convergence is to be expected but the algorithm may need more iterations.
- smoothing
optional scaling factor for the final penalty strength parameters
Increasing this beyond one will lead to a smoother final model. Can be an integer or a vector of length equal to the length of the penalty strength parameter.
- maxiter
maximum number of iterations in the outer optimisation over the penalty strength parameters.
- tol
Convergence tolerance for the penalty strength parameters.
- control
list of control parameters for
optim
to use in the inner optimisation. Here,optim
uses theBFGS
method which cannot be changed.We advise against changing the default values of
reltol
andmaxit
as this can decrease the accuracy of the Laplace approximation.- silent
integer silencing level: 0 corresponds to full printing of inner and outer iterations, 1 to printing of outer iterations only, and 2 to no printing.
- joint_unc
logical, if
TRUE
, joint RTMB object is returned allowing for joint uncertainty quantification- saveall
logical, if
TRUE
, then all model objects from each iteration are saved in the final model object.- epsilon
vector of two values specifying the cycling detection parameters. If the relative change of the new penalty strength to the previous one is larger than
epsilon[1]
but the change to the one before is smaller thanepsilon[2]
, the algorithm will average the two last values to prevent cycling.
See also
penalty
to compute the penalty inside the likelihood function
Examples
data = trex[1:1000,] # subset
# initial parameter list
par = list(logmu = log(c(0.3, 1)), # step mean
logsigma = log(c(0.2, 0.7)), # step sd
beta0 = c(-2,2), # state process intercept
betaspline = matrix(rep(0, 18), nrow = 2)) # state process spline coefs
# data object with initial penalty strength lambda
dat = list(step = data$step, # step length
tod = data$tod, # time of day covariate
N = 2, # number of states
lambda = rep(10,2)) # initial penalty strength
# building model matrices
modmat = make_matrices(~ s(tod, bs = "cp"),
data = data.frame(tod = 1:24),
knots = list(tod = c(0,24))) # wrapping points
dat$Z = modmat$Z # spline design matrix
dat$S = modmat$S # penalty matrix
# penalised negative log-likelihood function
pnll = function(par) {
getAll(par, dat) # makes everything contained available without $
Gamma = tpm_g(Z, cbind(beta0, betaspline), ad = TRUE) # transition probabilities
delta = stationary_p(Gamma, t = 1, ad = TRUE) # initial distribution
mu = exp(logmu) # step mean
sigma = exp(logsigma) # step sd
# calculating all state-dependent densities
allprobs = matrix(1, nrow = length(step), ncol = N)
ind = which(!is.na(step)) # only for non-NA obs.
for(j in 1:N) allprobs[ind,j] = dgamma2(step[ind],mu[j],sigma[j])
-forward_g(delta, Gamma[,,tod], allprobs, ad = TRUE) +
penalty(betaspline, S, lambda) # this does all the penalization work
}
# model fitting
mod = qreml(pnll, par, dat, random = "betaspline")
#> Creating AD function
#> Initialising with lambda: 10 10
#> outer 1 - lambda: 4.909 4.287
#> outer 2 - lambda: 2.675 2.463
#> outer 3 - lambda: 1.608 1.716
#> outer 4 - lambda: 1.068 1.319
#> outer 5 - lambda: 0.784 1.067
#> outer 6 - lambda: 0.632 0.888
#> outer 7 - lambda: 0.55 0.75
#> outer 8 - lambda: 0.506 0.639
#> outer 9 - lambda: 0.482 0.547
#> outer 10 - lambda: 0.471 0.469
#> outer 11 - lambda: 0.465 0.402
#> outer 12 - lambda: 0.464 0.345
#> outer 13 - lambda: 0.464 0.297
#> outer 14 - lambda: 0.466 0.256
#> outer 15 - lambda: 0.468 0.223
#> outer 16 - lambda: 0.471 0.195
#> outer 17 - lambda: 0.474 0.173
#> outer 18 - lambda: 0.476 0.155
#> outer 19 - lambda: 0.478 0.141
#> outer 20 - lambda: 0.48 0.13
#> outer 21 - lambda: 0.482 0.122
#> outer 22 - lambda: 0.484 0.115
#> outer 23 - lambda: 0.485 0.109
#> outer 24 - lambda: 0.486 0.105
#> outer 25 - lambda: 0.487 0.102
#> outer 26 - lambda: 0.488 0.1
#> outer 27 - lambda: 0.489 0.098
#> outer 28 - lambda: 0.489 0.096
#> outer 29 - lambda: 0.489 0.095
#> outer 30 - lambda: 0.49 0.094
#> outer 31 - lambda: 0.49 0.094
#> outer 32 - lambda: 0.49 0.093
#> outer 33 - lambda: 0.49 0.093
#> outer 34 - lambda: 0.49 0.092
#> outer 35 - lambda: 0.49 0.092
#> outer 36 - lambda: 0.491 0.092
#> outer 37 - lambda: 0.491 0.092
#> outer 38 - lambda: 0.491 0.092
#> Converged
#> Final model fit with lambda: 0.491 0.092