Outer Optimizers

Maturity: beta — see Feature Maturity for what this means.

The outer optimizer is the algorithm that minimises the population objective function (OFV) over the transformed parameter vector [log θ, chol(Ω), log σ].

It only applies to methods that have a single outer-loop minimisation over that vector:

method	does `optimizer` apply?
`foce` / `focei`	yes — picks the algorithm for the population OFV minimisation
`gn` (pure Gauss-Newton)	no — GN runs its own Levenberg-Marquardt loop
`gn_hybrid`	yes for the FOCEI polish phase only; the preceding GN phase ignores it
`saem`	no — SAEM has no single outer minimisation; the M-step uses a hardcoded NLopt algorithm and is not user-selectable. Setting `optimizer` under `method = saem` triggers a warning and is otherwise ignored.
`[saem, focei]` (chain)	applies to the FOCEI polish stage only
`imp`	no — IS is a sampling pass, not an optimisation

Set via [fit_options]:

[fit_options]
  method    = focei
  optimizer = auto

The default optimizer = auto picks the algorithm per model (see auto below); name a specific optimizer to override it.

Available optimizers

Automatic selection

Key	Algorithm	Notes
`auto`	Per-model selection	Default. Resolves to `nlopt_lbfgs` when the exact analytic FOCE/FOCEI gradient is available, and `bobyqa` when only finite differences are. The fit output records the resolved pick as `auto (<resolved>)`. See The `auto` default.

NLopt algorithms

Key	Algorithm	Notes
`bobyqa`	Bounded Optimization BY Quadratic Approximation	Derivative-free quadratic trust-region; what `auto` selects on FD-only fits. Avoids the fixed-EBE gradient bias that stalls gradient methods on ill-conditioned fits; consistently reaches a lower OFV on ODE/PD models, sparse data, and Hill-ridge problems without reconverging the inner loop.
`slsqp`	Sequential Least Squares Programming	Gradient-based, handles bounds well. Fast on well-conditioned analytical PK models; can stall above the true minimum on ODE/PD / sparse-data fits unless paired with `reconverge_gradient_interval = 1` (5–6× cost).
`nlopt_lbfgs`	L-BFGS via NLopt	Limited-memory BFGS. Fast on analytical 1-/2-/3-cpt FOCE/FOCEI fits (exact analytic gradient); also useful for high-parameter-count models.
`mma`	Method of Moving Asymptotes	Alternative constrained gradient optimizer. Rarely needed.

Deprecated aliases

Key	Algorithm	Notes
`bfgs`	→ `nlopt_lbfgs`	Deprecated alias. Now selects NLopt L-BFGS.
`lbfgs`	→ `nlopt_lbfgs`	Deprecated alias. Now selects NLopt L-BFGS.

Note

bfgs and lbfgs previously selected a hand-rolled built-in BFGS / limited-memory L-BFGS. Benchmarking showed that path was strictly worse than NLopt L-BFGS — 3–5× slower on analytic-gradient models and prone to diverging or hanging on harder problems (#483) — so both keys are now aliases for nlopt_lbfgs. The built-in implementation is slated for removal.

Newton trust-region

Key	Algorithm	Notes
`trust_region`	Newton trust-region with Steihaug CG	Uses the analytic outer gradient and a BHHH approximate Hessian `H ≈ 4 Σ gᵢgᵢᵀ` (always positive semi-definite). The CG budget defaults to `ceil(sqrt(n_params)).clamp(5, n_params)`; pin it with `steihaug_max_iters`. Best combined with `inits_from_nca` since it benefits from good starting values.

When to use which

auto (default) — leave this in place for most fits. It inspects the model and picks nlopt_lbfgs when the exact analytic FOCE/FOCEI gradient is available and bobyqa when only finite differences are — the two choices that came out fastest-and-most-reliable in the #490 benchmarking (see The auto default). Override it only when you have a model-specific reason to pin a particular optimizer.

bobyqa — what auto selects on FD-only problems, and a fine explicit choice there. Derivative-free quadratic trust-region; re-evaluates EBEs at every trial point so it avoids the fixed-EBE gradient bias that stalls gradient-based optimizers on ill-conditioned fits. On the cefepime 2-cpt benchmark below it reaches a lower OFV than slsqp (even reconverged at full cost) in less wall time. Works equally well on smooth analytical PK models — converges more slowly than gradient methods per iteration but each iteration is cheap (no FD gradient sweep).

slsqp — switch to this when bobyqa is too slow on a smooth, well-conditioned model with many parameters (it can need many quadratic-interpolation samples to triangulate a high-dimensional surface). Gradient-based, handles box constraints cleanly; pair with reconverge_gradient_interval = 1 if it stalls above an expected OFV.

trust_region — for models with many parameters (large θ + Ω) or when combined with inits_from_nca. The second-order curvature information helps when starting values are already in the basin.

gn / gn_hybrid — these are Gauss-Newton estimation methods, not outer optimizers in the same sense. They replace the FOCE outer loop entirely rather than selecting an algorithm within it. (gn_hybrid polishes via FOCEI and inherits the optimizer setting for that stage — so the polish runs with the auto-selected optimizer unless overridden.)

lbfgs / nlopt_lbfgs — strong choices on analytical 1-/2-/3-cpt FOCE/FOCEI fits, where they get an exact bias-free gradient (see Analytic FOCE / FOCEI gradient) that reaches the true optimum at wall-time comparable to bobyqa (and several × faster than the now-deprecated built-in BFGS). Since #410 they also get the exact gradient on in-scope user-[odes] models (bolus/infusion dosing, ObsCmt/Form-C/per-CMT readout, F, resets, static covariates, and — since #439 — time-varying covariates on bolus dosing, steady-state dosing #473, IOV #466, IOV combined with an ExpressionScale obs_scale = expr divisor #575, and IOV + time-varying covariates + ExpressionScale #590), so lbfgs is a good choice there too; out-of-scope ODE features (input_rate/SDE, LTBS combined with an ExpressionScale divisor, non-IOV TV-covariates combined with an ExpressionScale divisor, IOV combined with LTBS, and TV-covariates combined with infusion/SS/reset) still fall back to the finite-difference gradient and its fixed-EBE bias — prefer bobyqa for those. mma is rarely needed.

Why these two? bobyqa doesn’t use the gradient, so it never sees the fixed-EBE FD gradient bias that drives gradient-based optimizers to local minima hundreds of OFV units above the true optimum on ODE/PD models and sparse data (the Emax PKPD benchmark in saem.md and the cefepime validation below). On analytical PK models the gradient is exact and bias-free, and nlopt_lbfgs exploits it to reach the optimum faster than the derivative-free search. auto (default since #490) routes each model to the right one of the two automatically; before auto the standalone default was bobyqa, and before that slsqp.

The `auto` default

optimizer = auto (the default) resolves to a concrete optimizer from the model, with no user input:

nlopt_lbfgs — when the exact analytic FOCE/FOCEI gradient is available: an analytical PK model (or an in-scope user-[odes] model, see Analytic FOCE / FOCEI gradient) that is in the sensitivity provider’s scope and is not forced onto finite differences via gradient = fd.
bobyqa — otherwise: out-of-scope ODE/PD models, log-transform-both-sides (LTBS), SDE models, or any fit with gradient = fd, where the outer loop runs on finite differences.

The choice mirrors the analytic-vs-FD decision the outer loop already makes, so auto lands on the gradient-based optimizer exactly when there is an exact gradient to feed it. Limited benchmarking across ~10 real FOCEI datasets (#490) found nlopt_lbfgs fastest-to-optimum on every analytic-gradient problem (bobyqa was faster but often stopped short of the optimum, slsqp converged but much slower), while bobyqa was both fastest and most reliable when only finite differences were available (nlopt_lbfgs/slsqp/mma were very slow and often missed the basin).

The resolved optimizer is recorded in the fit output as auto (<resolved>) — e.g. auto (nlopt_lbfgs) — so a run records which algorithm actually executed. These defaults may be refined as the analytic-gradient scope grows and more datasets are studied; pin optimizer explicitly to opt out of the automatic choice.

Fixed-EBE gradient bias and `reconverge_gradient_interval`

By default, gradient-based optimizers (slsqp, lbfgs, etc.) hold each subject’s EBEs fixed while computing the population gradient. This is cheap but omits the response of the inner solution to θ and Ω. On ill-conditioned models the omitted term causes slsqp to stall well above the bobyqa optimum.

Set reconverge_gradient_interval = 1 to re-solve the inner loop at every gradient evaluation, recovering the full gradient surface at roughly 5–6× cost. A value of N reconverges every Nth evaluation and uses the cheap gradient in between — often enough to close most of the gap:

Configuration	OFV	Wall time
`slsqp` (fixed-EBE)	68,252	390 s
`slsqp` + `interval = 10`	66,118	633 s
`slsqp` + `interval = 1`	65,485	1,871 s
`bobyqa` (derivative-free, default)	65,598	315 s

On this cefepime 2-compartment dataset bobyqa reaches a near-optimal solution faster than any reconverged slsqp setting. The reconverged gradient is most useful when derivative-free search is too slow (high parameter count) or when a gradient optimizer is required for other reasons.

IOV models (kappa/block_kappa) always reconverge regardless of this setting. Even so, a pure slsqp cold-start on an IOV model can terminate a few OFV units above the minimum, and the exact stopping point is platform-dependent (the re-converged FD gradient’s summation order differs across architectures — issue #160). Prefer the default bobyqa or a methods = [saem, focei] chain for IOV fits; both reach the minimum platform-independently. See Inter-Occasion Variability.

Analytic FOCE / FOCEI gradient (analytical PK models)

For analytical 1-, 2-, and 3-compartment models (IV bolus/infusion, oral, and steady state) under FOCE or FOCEI, the gradient-based optimizers (bfgs, lbfgs, nlopt_lbfgs, slsqp) automatically switch from the finite-difference gradient to an exact closed-form marginal gradient (Almquist et al. 2015). FOCEI differentiates the Laplace marginal; FOCE differentiates ferx’s Sheiner–Beal linearized marginal. Both are computed analytically through second-order dual numbers and include the full EBE response (the term reconverge_gradient_interval recovers by brute force) — so they do not carry the fixed-EBE bias described above, at no extra cost. There is nothing to configure: the method and optimizer choices are the switch, and any model outside the analytical scope falls back to the finite-difference gradient transparently.

Because the gradient is both exact and bias-free here, lbfgs / nlopt_lbfgs are strong choices on these models: each gradient evaluation is a closed form rather than an O(n) finite-difference sweep, and — unlike the FD gradient — it carries no fixed-EBE bias, so the optimizer reaches the true optimum. Wall-time versus the derivative-free bobyqa default is model-dependent and roughly comparable (e.g. warfarin FOCEI: lbfgs ≈ 0.29 s vs bobyqa ≈ 0.43 s; on a 2-cpt fit bobyqa is faster but can stop short of the optimum), and both are several times faster than the now-deprecated built-in BFGS.

The exact gradient lands on the same optimum as the underlying objective, validated against NONMEM on the warfarin 1-cpt oral model: FOCE reaches OFV −280.36 (NONMEM −280.36; TVCL 0.1330, TVKA 0.7252, ω²: 0.0286 / 0.00958 / 0.349) and FOCEI reaches −286.00 (NONMEM −286.00) — both agreeing to ~4–5 significant figures across θ, Ω, and σ.

The exact gradient also covers log-transform-both-sides (log(DV) ~ additive(...)) and constant output scaling ([scaling] obs_scale = k): the analytic provider applies the g = ln(f) jet transform (value, gradient, and Hessian) and the constant divisor in closed form. Validated against NONMEM on the warfarin LTBS model — the gradient-based lbfgs path reaches OFV −675.302 and recovers NONMEM’s MLE (TVCL 0.1327, TVV 7.738, TVKA 0.811) to ~4 significant figures.

The fallback (finite-difference) gradient is used when the model has: an ODE system, inter-occasion variability (kappa), a dose lagtime, time-varying covariates, system resets, expression/per-compartment output scaling, or an overlapping steady-state infusion (T_inf > II, #379). For those, the guidance above (prefer bobyqa, or reconverge slsqp) still applies.

Global pre-search

Enable global_search = true to run a derivative-free CRS2-LM pre-search before the local optimizer. This helps escape local minima on complex models:

[fit_options]
  method        = focei
  global_search = true
  global_maxeval = 2000   # auto-scaled if omitted

The pre-search uses NLopt’s Controlled Random Search and requires lower/upper bounds on all theta parameters. It hands off to the local optimizer (optimizer key) once the budget is exhausted.

See also fit-options.md for the full list of optimizer-related options.