Fit Options
The optional [fit_options] block configures the estimation method and optimizer settings.
Syntax
[fit_options]
key = value
General Options
| Key | Values | Default | Description |
|---|---|---|---|
method |
foce, focei, saem, gn, gn_hybrid, impmap, imp (only as final chain stage) |
focei |
Estimation method (or single-stage method in a chain). See chained fits. When neither this key nor a wrapper’s method argument is set, the engine falls back to focei and emits a warning so the implicit choice is visible — set method here to silence it. |
maxiter |
integer | 500 |
Maximum outer loop iterations. maxiter = 0 is evaluation only (NONMEM MAXEVAL=0): the engine runs one inner EBE solve at the initial parameters and reports the objective there with no outer optimisation — useful for scoring a fixed parameter set or computing standard errors at known estimates (with covariance = true). The reported fit is marked not-converged, and the parameters are returned unchanged. |
covariance |
true, false |
true |
Compute covariance matrix and standard errors |
covariance_method |
r, s, rsr |
r |
Which covariance estimator to assemble, mirroring NONMEM $COVARIANCE MATRIX=. r = inverse Hessian R⁻¹ (model-based; the default). s = inverse score cross-product S⁻¹ (empirical information). rsr = the Huber–White sandwich R⁻¹SR⁻¹ (robust to model mis-specification; NONMEM’s default). ferx defaults to r, but NONMEM’s $COVARIANCE default is rsr — set covariance_method = rsr when reconciling SEs against a NONMEM run that used the default $COV. s/rsr are supported for FOCEI, FOCE, and IOV fits; all three anchored against NONMEM $COV MATRIX=S/RSR within ~10%. All methods require a positive-definite FD Hessian (a non-PD Hessian is not yet routed to the s cross-product). Requires covariance = true. |
covariance_fallback |
none, sir |
none |
What to do when the FD Hessian is not positive definite (non-PD). none leaves the covariance step as failed. sir runs SIR with a fallback proposal covariance built from the \|eigenvalue\|-rectified Hessian, inflated 4×; the YAML reports SIR-based 95% CIs instead of Hessian SEs. Requires covariance = true. Ignored when the Hessian succeeds. |
fd_hessian_step |
float | 1e-2 |
Initial relative step size for the finite-difference Hessian used in the covariance step. The actual perturbation for parameter i is fd_hessian_step * (1 + |x_hat[i]|). ferx automatically halves this value up to 8 times if any diagonal stencil evaluates to a non-finite value, so in most cases the default requires no tuning. Set explicitly only if the automatic reduction is insufficient (try 0.1) or if FD noise is the main concern on a very smooth surface (try 1e-3). Must be positive and finite. |
optimizer |
auto, bobyqa, slsqp, nlopt_lbfgs, mma, trust_region (plus deprecated aliases bfgs/lbfgs → nlopt_lbfgs) |
auto |
Outer-loop optimisation algorithm. Applies to method = foce / focei and to the FOCEI polish phase of method = gn_hybrid. The default auto picks per model: nlopt_lbfgs when the exact analytic FOCE/FOCEI gradient is available, and bobyqa when only finite differences are (ODE/PD models, LTBS/SDE, or gradient = fd) — see the Outer Optimizers page for the applicability matrix and the benchmarking (#490) behind these defaults. The fit output records the resolved pick (e.g. auto (nlopt_lbfgs)). Ignored (with a runtime warning) under method = saem / imp / gn. bfgs and lbfgs are deprecated aliases that now select nlopt_lbfgs (the hand-rolled built-in BFGS/L-BFGS is slated for removal — see #483). |
outer_xtol |
float | 1e-4 |
Relative step tolerance for the derivative-free bobyqa outer optimizer (NLopt xtol_rel): how far its trust radius must shrink to declare success. 1e-4 in scaled log-space is a ~0.01% move on the natural parameter scale. Only affects bobyqa (the auto pick for FD-objective models — ODE/PD, LTBS/SDE, gradient = fd, and TTE); gradient-based optimizers use their own fixed internal tolerances. Must be positive and finite. |
outer_ftol |
float | auto | Relative objective tolerance for the derivative-free bobyqa outer optimizer (NLopt ftol_rel): the relative OFV change below which it stops. Unset = auto: 1e-8 for a pure-TTE model and 1e-6 otherwise. The TTE tightening fixes #469 — on the near-flat frailty-ω² ridge of a nonlinear hazard parameter (the whole ridge spans <0.01 OFV) the historical 1e-6 stopped the optimizer short, leaving the variance above its own minimum (a Weibull shape-frailty read ω² 0.204 vs the NONMEM/nlmixr2 0.175 consensus; at 1e-8 it lands 0.176). The 1e-6 floor for non-TTE fits is deliberate: a TTE hazard objective is evaluated exactly, but on a noisy objective (ODE solver error, FD-inner FOCE such as LTBS) 1e-8 is unreachable, so BOBYQA grinds toward its maxeval budget instead of converging (≈3× the evaluations on an ODE fit). Set an explicit value to pin it for every model — e.g. outer_ftol = 1e-8 to tighten an LTBS flat-ridge fit, accepting the cost. Must be positive and finite. |
inner_maxiter |
integer | 200 |
Max iterations for the inner (per-subject EBE) optimizer |
inner_optimizer |
auto, bfgs, lbfgs, nelder_mead |
auto |
Inner-loop (per-subject EBE) optimisation algorithm. auto uses dense BFGS, switching to limited-memory L-BFGS above 32 random effects. The explicit values pin a single algorithm with no automatic switching: bfgs (dense), lbfgs (limited-memory two-loop recursion — cheaper per step at high random-effect dimension), nelder_mead (derivative-free, for diagnosing gradient issues). All gradient-based variants converge to the same EBE; the choice trades per-step cost against memory. Applies to every method with an inner loop (foce/focei/gn/gn_hybrid/saem/impmap). |
inner_tol |
float | 1e-5 |
Gradient-norm convergence tolerance for the inner (per-subject EBE) optimizer. A looser tolerance leaves residual noise in each subject’s EBE solution, which propagates into the marginal objective the outer optimizer sees; on models with a noisy or flat marginal surface (FD-inner FOCE such as log-transform-both-sides) that noise can make the derivative-free BOBYQA outer optimizer false-converge above the true minimum. 1e-5 removes enough of that noise to match NONMEM’s minimum, at roughly 1.5× the per-fit cost of the older 1e-4 default. (For reference, NONMEM runs its inner conditional optimization far tighter than its outer convergence: the inner precision SIGL defaults to ~10 significant digits, versus the outer NSIG/SIGDIGITS default of 3.) Tighter is not uniformly better: at 1e-6 some ill-conditioned fits over-converge the inner Hessian and the outer optimizer can land in a worse basin. Loosen it (e.g. 1e-4) to speed up well-conditioned fits where it makes no difference; change it further only with validation against a reference fit. |
cov_inner_tol |
float | inner_tol (LTBS: min(inner_tol, 1e-8)) |
Inner EBE-reconvergence tolerance used only by the covariance step, decoupled from inner_tol. The covariance R-matrix is a second-difference of the reconverged OFV and is more sensitive to EBE precision than the fit itself, so on a flat or weakly-identified surface an EBE converged only to the fit’s inner_tol can leave enough residual noise to perturb the standard errors (e.g. the heavily-censored M3 + IOV case in #654 — set cov_inner_tol = 1e-11). This lets the covariance step reconverge tighter without slowing every outer iteration. Unset (default) uses inner_tol for ordinary models (SEs unchanged) and min(inner_tol, 1e-8) for LTBS models, whose g = ln(f) covariance Hessian needs the tighter reconvergence. The default is deliberately not tightened for every model — over-converging some ill-conditioned inner Hessians (e.g. IOV block-Ω) drives the covariance indefinite — so opt in per model with this key. |
steihaug_max_iters |
integer | adaptive | Max CG iterations for the Steihaug subproblem (only used when optimizer = trust_region). Default (unset) uses ceil(sqrt(n_params)).clamp(5, n_params) — typically 5 for standard NLME models. Set explicitly (e.g. steihaug_max_iters = 50) to pin the budget. |
ode_reltol |
float | 1e-4 |
RK45 ODE solver relative tolerance. ODE models only (ignored for analytical PK). The default reproduces analytical closed forms in PRED to ~1e-4, but the FOCE objective amplifies solver error, so the OFV of an ODE-form model can differ from its analytical equivalent by several units. Set a tighter value (e.g. 1e-10) when the ODE-form OFV must match an analytical reference; expect slower fits. |
ode_abstol |
float | 1e-6 |
RK45 ODE solver absolute tolerance (companion to ode_reltol). ODE models only. |
ode_max_steps |
integer | 10000 |
Max RK45 steps per integration segment. ODE models only. Raise (e.g. 1000000) if a tight ode_reltol exhausts the step budget on stiff multi-compartment systems. |
global_search |
true, false |
false |
Run NLopt CRS2-LM (Controlled Random Search with Local Mutation) as a gradient-free global pre-search before the local optimizer. CRS2-LM samples within the parameter bounds; the local optimizer (e.g. bobyqa, slsqp) starts from the best point found. Useful for poorly-identified models — when the local optimizer can land in a degenerate basin (collapsed ETA, V/Q swap, parameters at bounds) from a far-from-truth start, the global pre-search usually escapes it. Adds the pre-search budget on top of the local optimisation, but typically more efficient than running multiple full fits from scratch. Requires a full NLopt build (e.g. brew install nlopt or apt install libnlopt-dev); a clear warning is emitted if CRS2-LM is unavailable. |
global_maxeval |
integer | 200 * (n_params + 1) |
Maximum evaluations of the FOCE objective during the global pre-search. Each eval is a full inner-loop pass over all subjects, so this is the dominant cost of global_search = true. The default (0 → auto) is empirically enough to escape bad basins on 10–20 parameter PK models without dominating the wall time of the subsequent local refine. |
bloq_method |
drop, m3 |
drop |
How to handle rows with CENS=1 (below LLOQ) or CENS=-1 (above ULOQ). m3 enables Beal’s M3 likelihood (see BLOQ example). |
npde_nsim |
integer ≥ 0 | 0 |
Number of Monte-Carlo replicates per subject used to compute the simulation-based NPDE/NPD diagnostics after the fit. 0 (default) disables the computation — no NPDE/NPD columns are emitted. A typical value is 1000. Cost scales linearly with the count. |
npde_seed |
integer | — | RNG seed for the NPDE/NPD simulation, for reproducible diagnostics. Unset falls back to a fixed default. Only used when npde_nsim > 0. |
mu_referencing |
true, false |
true |
Re-centre inner-loop ETA estimates on the current population mean (auto-detected from [individual_parameters]). See the FAQ entry for details. Set false to reproduce pre-automatic-mu behaviour. |
iov_column |
string | — | Name of the occasion column in the dataset (e.g. OCC). Required when the model uses kappa or block_kappa declarations. The column must contain integer occasion indices. Case-insensitive. Only supported with foce / focei — not saem. See IOV documentation. |
optimizer_trace |
true, false |
false |
Write a per-iteration CSV to /tmp/ferx_trace_<pid>_<ts>.csv. The path is stored in FitResult::trace_path. Useful for diagnosing convergence problems or comparing optimizers. See Optimizer Trace. |
reconverge_gradient_interval |
integer ≥ 0 | 0 |
How often to re-solve each subject’s inner EBE loop during the population gradient instead of holding the EBEs (η̂) and FOCE Hessian fixed. 0 (default) never reconverges — the cheap fixed-EBE gradient is used. 1 reconverges on every gradient evaluation; N reconverges on evaluations 0, N, 2N, … and uses the cheap gradient in between (amortizing the cost by ~N× while still periodically correcting the search direction). The fixed-EBE gradient omits the inner solution’s response to θ/Ω, so on ill-conditioned fits a gradient optimizer (e.g. slsqp) can stall well above the derivative-free (bobyqa) optimum; reconverging recovers the full surface at roughly 5–6× the per-gradient cost at interval 1. IOV models (kappa/block_kappa) always reconverge and ignore this setting. See the FOCE/FOCEI page. |
inits_from_nca |
true, false, nca, nca_sweep, nca_ebe |
false |
Derive NCA-based starting values from the data before the optimizer loop. true (alias for nca_sweep) and false/off toggle the default strategy; the three named values pick a strategy explicitly (see NCA-based starting values). Fixed thetas are never overwritten; covariate-effect thetas (no mu-referencing link) keep the model default. Most useful with method = trust_region or method = gn / method = gn_hybrid, where bad starting values can cause early stalling. The same estimation is available without running a fit via the CLI flag --inits-from-nca[=METHOD] and (in ferx-r) the ferx_inits_from_nca() function. |
NCA-based starting values
inits_from_nca estimates starting values directly from the data using non-compartmental analysis (NCA), then optionally refines parameters NCA cannot estimate. All three strategies are NCA-based; they differ only in how much refinement runs on top:
| Value | Strategy | What it does | Typical cost |
|---|---|---|---|
nca |
NCA only | Per-subject NCA (AUC, terminal slope, Wagner–Nelson Ka, biexponential peeling for 2/3-cpt) pooled to population geometric means. Leaves parameters NCA can’t reach (peripheral Q/V2, all ODE/PD thetas) at the model default. | < 5 ms |
nca_sweep |
NCA + sweep | Runs nca, then sweeps every remaining non-fixed theta over a log-space rRMSE grid using population predictions (etas = 0). Model-agnostic — also covers ODE/PD models. This is what true selects. |
< 500 ms (analytical) |
nca_ebe |
NCA + EBE sweep | Like nca_sweep but evaluates the grid with empirical Bayes estimates (etas ≠ 0); more accurate under large IIV (omega > ~0.2). Falls back to nca_sweep for ODE models. |
< 500 ms (analytical) |
The CL eta’s omega is also updated from inter-subject CV² when ≥ 3 subjects have a valid NCA estimate.
When nca_sweep is enabled but the fit fails to converge or the OFV looks suspiciously high, try nca_ebe.
Simulation-based diagnostics (NPDE / NPD)
Setting npde_nsim > 0 adds two simulation-based goodness-of-fit columns to the sdtab output: NPD (Normalized Prediction Discrepancies) and NPDE (Normalized Prediction Distribution Errors). Unlike CWRES — which linearises the model around the conditional mode and so inherits the bias of a first-order approximation — NPDE/NPD are built entirely from Monte-Carlo simulation under the fitted model, so they are robust to model nonlinearity and to non-Gaussian random effects (Brendel et al. 2006; Comets et al. 2008, the npde R package).
[fit_options]
method = focei
npde_nsim = 1000 # replicates per subject (off when 0, the default)
npde_seed = 12345 # optional, for reproducibilityThe effective seed actually used — the explicit npde_seed when given, otherwise the built-in default — is recorded as model: npde_seed: in the {model}-fit.yaml output (and round-trips through .fitrx), so the NPDE/NPD columns can always be regenerated from the saved fit, even when no seed was set in the model file. The line is omitted when NPDE did not run (npde_nsim = 0).
For each observation y_ij, ferx simulates K = npde_nsim replicates under the fitted θ/Ω/Σ (sampling η ~ N(0, Ω) and the residual error), evaluated at the subject’s own observation design:
- NPD (no decorrelation):
pd_ijis the empirical CDF of the simulated values at observationij, andnpd_ij = Φ⁻¹(pd_ij). - NPDE (decorrelated): within each subject the observed and simulated vectors are decorrelated with the empirical mean and Cholesky factor of the simulated covariance (the Brendel/Comets procedure) before the empirical CDF and inverse- normal transform. This removes the within-subject correlation that NPD ignores.
Empirical-CDF probabilities at 0 or 1 are clamped to [1/(2K), 1 − 1/(2K)] before Φ⁻¹, matching the npde-package convention, so the scores stay finite.
Under a correctly specified model, NPDE follows a standard normal distribution. On the warfarin example (examples/warfarin.ferx, data/warfarin.csv), a converged FOCEI fit with npde_nsim = 1000 gives the expected mean-zero, unit-variance distribution:
| Diagnostic | mean | variance | reference |
|---|---|---|---|
| NPDE | 0.006 | 1.02 | ≈ N(0, 1) |
| NPD | −0.002 | 0.88 | mean ≈ 0; variance < 1 because NPD retains within-subject correlation |
This matches the npde R package’s autonpde() (same Cholesky decorrelation and edge-clamping) to Monte-Carlo noise. See tests/npde_validation.rs for the engine-side check and an R reproduction recipe.
NONMEM cross-validation
NONMEM has no native NPDE output; the reference pipeline is a NONMEM $SIMULATION post-processed by the npde R package. The same warfarin data was fit with NONMEM 7.5.1 (ADVAN2 TRANS2, METHOD=1 INTER, proportional error) and the two fits agree, so any difference in the diagnostics is attributable to the NPDE computation rather than the fit:
| Quantity | ferx (FOCEI) | NONMEM 7.5.1 |
|---|---|---|
| TVCL | 0.1328 | 0.1327 |
| TVV | 7.738 | 7.738 |
| TVKA | 0.817 | 0.811 |
| OFV | −286.0015 | −286.0042 |
Feeding a 1000-replicate NONMEM $SIMULATION (under the NONMEM estimates) and the observed data to npde::autonpde() (Cholesky decorrelation, edge-clamping) reproduces ferx’s NPDE/NPD on the same 110 observations:
| Diagnostic | ferx mean | ferx var | npde-pkg mean | npde-pkg var |
|---|---|---|---|---|
| NPDE | 0.005 | 1.09 | 0.036 | 1.04 |
| NPD | −0.009 | 0.91 | −0.004 | 0.91 |
The NPD variance (the quantity that should sit below 1 because NPD retains the within-subject correlation) matches to three figures (0.91 vs 0.91); the small NPDE-variance gap is Monte-Carlo noise between two independent simulation draws. Both engines give NPDE ≈ N(0, 1) and a mean-zero NPD, as expected under this well-specified model.
Limitations. M3/BLQ censored observations need the predictive-CDF variant and are out of scope: censored rows (CENS != 0) are emitted as empty/NaN (matching the CWRES/IWRES convention), and a subject’s NPDE is NaN as a whole when it has any censored row, since the within-subject decorrelation would otherwise mix the LLOQ into the uncensored rows. NPDE also requires more replicates than a subject has observations (npde_nsim > n_obs) for a full-rank simulated covariance; otherwise that subject’s NPDE is NaN (NPD is still computed). For IOV (kappa) models the simulation holds kappas at zero (the existing simulate() convention), so NPDE/NPD omit the inter-occasion component of the predictive variance.
Estimation Methods
FOCEI (default)
method = focei
FOCE with Interaction. Includes the dependence of the residual error on random effects. More accurate than FOCE when the error model depends on individual predictions, but slightly slower.
FOCE
method = foce
First-Order Conditional Estimation. Linearizes the model around the empirical Bayes estimates. Fast and reliable for most models.
SAEM
method = saem
Stochastic Approximation EM. Uses Metropolis-Hastings sampling instead of MAP optimization for random effects. More robust to local minima; recommended for complex models with many random effects.
SAEM-Specific Options
| Key | Default | Description |
|---|---|---|
n_exploration |
150 |
Phase 1 iterations (step size = 1) |
n_convergence |
250 |
Phase 2 iterations (step size = 1/k) |
n_mh_steps |
20 |
Block Metropolis-Hastings steps per subject per iteration. Also sizes the componentwise decorrelating kernel that prevents block-Ω collapse (max(2, n_mh_steps / n_eta) sweeps; multi-η models only — skipped when n_eta < 2). When n_leapfrog > 0, this applies to subjects that fall back to MH (see below); HMC subjects use one proposal per iteration regardless. |
n_leapfrog |
0 |
Leapfrog steps per HMC proposal (0 = use MH; see below). When > 0, subjects for which HMC is unavailable (ODE model, missing analytical PK path, non-finite Ω, unsupported TV-cov path) fall back to MH using n_mh_steps proposals. |
adapt_interval |
50 |
Iterations between step-size adaptation |
omega_burnin |
20 |
Initial exploration iterations during which Ω (and ΩIOV) are held at their starting values while the MH chain warms up. Clamped to n_exploration; set 0 to disable. Prevents the Ω collapse described in the SAEM page. |
seed |
12345 |
RNG seed for reproducibility |
conddist |
false |
Run a post-fit conditional-distribution pass estimating each subject’s p(η_i \| y_i) by MCMC. Reports per-subject conditional mean/SD and distribution-based η-shrinkage (the shrinkage-unbiased basis for η diagnostics). See the SAEM page. |
conddist_nsamp |
200 |
Retained MCMC draws per subject in the conditional-distribution pass. Larger tightens the mean/SD estimates at linear cost. Production diagnostics want values in the thousands. Only used when conddist = true. |
conddist_burnin |
20 |
Burn-in draws discarded before accumulation, to forget the EBE-mode warm start. Only used when conddist = true. |
conddist_keep_samples |
false |
Retain the raw per-subject draws (written to {model}-conddist-samples.csv), not just the mean/SD. Only used when conddist = true. |
SIR (Sampling Importance Resampling)
SIR provides non-parametric parameter uncertainty estimates as an optional post-estimation step. Requires covariance = true.
| Key | Default | Description |
|---|---|---|
sir |
false |
Enable SIR uncertainty estimation |
sir_samples |
1000 |
Number of proposal samples (M) |
sir_resamples |
250 |
Number of resampled vectors (m) |
sir_seed |
12345 |
RNG seed for reproducibility |
sir_keep_samples |
false |
Retain resampled parameter vectors for simulate_with_uncertainty() |
sir_df |
5.0 |
Degrees of freedom for the Student-t proposal; higher values approach a normal proposal |
See SIR documentation for details.
Importance Sampling (IMP)
By default imp is a Monte-Carlo EM estimator (NONMEM METHOD=IMP): it maximises the importance-sampled marginal likelihood, updating θ/Ω/σ each iteration. Set imp_eval_only = true (NONMEM EONLY=1) to instead evaluate the marginal −2 log L at the fixed input parameters — a lower-bias −2 log L than the FOCE/Laplace OFV when subject posteriors of η are non-Gaussian (e.g. sparsely-sampled PK).
Behaviour change:
imppreviously only evaluated−2 log L. It now estimates by default. Addimp_eval_only = trueto recover the old behaviour.
[fit_options]
method = imp # estimate (NONMEM METHOD=IMP)
imp_iterations = 200
imp_samples = 1000
imp_averaging = 50
imp_proposal_df = 5 # or `normal` for a multivariate-normal proposal
imp_seed = 12345
[fit_options]
method = [focei, imp] # evaluate FOCEI's fit (NONMEM EONLY=1)
imp_eval_only = true
On rich data prefer
impmapor warm-start with[focei, imp]; plainimp’s one-iteration-lagged proposal can collapse the ESS on a sharp posterior. See the IMP documentation.
| Key | Default | Description |
|---|---|---|
imp_eval_only |
false |
true ⇒ evaluate −2 log L at fixed parameters (NONMEM EONLY=1); must be the terminal chain stage. false ⇒ estimate (NONMEM METHOD=IMP). |
imp_iterations |
200 |
MCEM iterations (estimator only). |
imp_averaging |
50 |
Terminal iterations averaged into the reported estimate (estimator only). |
imp_samples |
1000 |
Importance samples K per subject. 2000–5000 recommended for publication-quality MC SE. |
imp_proposal_df |
5.0 |
Student-t proposal degrees of freedom (≥ 1), or normal/mvn for a multivariate-normal proposal. Lower = heavier tails. |
imp_auto |
true |
Adaptive sample count (NONMEM AUTO). When true, imp_samples is the starting count and is ramped up (×2/iteration, cap 10000) while the objective’s Monte-Carlo SE exceeds 1.0. Recommended for high-dimensional / FREM models, where a fixed count biases the M-step. |
imp_seed |
12345 |
RNG seed. Same seed → identical result. |
imp_low_ess_threshold |
0.1 |
Subjects with normalized ESS below this fraction get flagged in the result. Set 0 to silence. |
imp_defensive_alpha |
0.0 |
Defensive-mixture weight (issue #528), opt-in. Each subject draws this fraction of its samples from the prior N(0, Ω) rather than the mode-centred proposal, and every sample is scored under the mixture density. This bounds the importance weights so a weakly-identified subject — e.g. an analytical [initial_conditions] baseline whose V cancels in the amplitude — can’t hijack the weighted M-step and walk θ to the bounds. Must be in [0, 1); the default 0 is the legacy single-proposal sampler (bit-comparable with NONMEM), set a small positive value (e.g. 0.1) to enable the rescue. Applies to imp and impmap (including the FREM Rao-Blackwell path); for an impmap stage you may also write impmap_defensive_alpha. Enabling it disables Sobol QMC and raises the per-subject ESS floor (so imp_low_ess_threshold flags fewer subjects). See Importance Sampling → Defensive mixture. |
See Importance Sampling documentation for the algorithm, the NONMEM mapping, IOV caveats, and tuning guidance.
IMPMAP (importance_sampling_map)
Like the estimating imp, impmap is a Monte-Carlo EM estimator — equivalent to NONMEM METHOD=IMPMAP. The difference is the proposal: impmap re-centers at the freshly-computed conditional mode (MAP) every iteration (robust on rich data), whereas imp re-centers from the previous iteration’s sample moments (cheaper, but fragile on rich data). Both update θ/Ω/σ from the importance-weighted posterior moments. It runs standalone or as a chain stage:
[fit_options]
method = importance_sampling_map # alias: impmap
impmap_iterations = 200
impmap_samples = 300
impmap_proposal_df = 4 # Student-t (default); `normal` for MVN
impmap_seed = 12345
| Key | Default | Description |
|---|---|---|
impmap_iterations |
200 |
Number of MCEM iterations (parameter updates). |
impmap_samples |
300 |
Importance samples K per subject per iteration. Larger K reduces Monte-Carlo noise at linear cost. |
impmap_proposal_df |
4 |
Proposal degrees of freedom. A finite value ≥ 1 gives a heavier-tailed Student-t (default 4); normal (or mvn) gives a multivariate-normal proposal (NONMEM’s default). The Gaussian’s lighter tails under-cover the posterior of weakly-identified parameters and bias the M-step moments, so ferx defaults to a Student-t. |
impmap_auto |
true |
Adaptive sample count (NONMEM AUTO). When true, impmap_samples is the starting count and is ramped up (×2/iteration, cap 10000) while the objective’s Monte-Carlo SE exceeds 1.0 (NONMEM STDOBJ). Strongly recommended for FREM / high-dimensional models — a fixed count leaves a sample-count-dependent bias in the typical-value and Ω estimates. |
impmap_averaging |
50 |
Final iterations whose parameters are averaged into the reported estimate (Monte-Carlo variance reduction). |
impmap_seed |
12345 |
RNG seed. Same seed → identical estimates. |
impmap_low_ess_threshold |
0.1 |
Subjects with normalized ESS below this fraction are flagged as poorly sampled. |
impmap_trace |
false |
When true, collect per-iteration parameter values into FitResult.impmap_trace — analogous to NONMEM .ext output for traceplots. |
impmap_mceta |
0 |
Number of additional random starting points for per-subject MAP optimization (analogous to NONMEM MCETA). Each start draws η from N(0, Ω). The start with the lowest individual NLL wins. 0 = single warm-start (default). 3 is a good choice for high-dimensional models (e.g. FREM with ≥ 5 ETAs). |
impmap_sobol |
false |
Use Sobol quasi-random sequences (with Cranley-Patterson randomization) for IS draws instead of pseudo-random. Gives more uniform posterior coverage with fewer samples. Only applies to MVN proposals (impmap_proposal_df = normal); Student-t falls back to pseudo-random. |
iscale_min |
0.1 |
Minimum proposal scaling factor for adaptive IS (NONMEM ISCALE_MIN). The IS proposal covariance is multiplied by s² where s is chosen from [iscale_min, iscale_max] to maximise per-subject ESS. Set both to 1.0 to disable. |
iscale_max |
10.0 |
Maximum proposal scaling factor (NONMEM ISCALE_MAX). |
frem_rao_blackwell |
true |
FREM only: Rao-Blackwellise the covariate ETAs (integrate them analytically, importance-sample only the PK ETAs). Strongly recommended — brute-force sampling of the near-singular covariate dimensions has very poor ESS. Set false only to diagnose the RB path against the full-dimensional sampler. |
impmap reuses inner_maxiter / inner_tol for the per-iteration MAP step. Inter-occasion variability ([iov] / kappa) and SDE ([diffusion]) models are not yet supported and are refused up front. See the IMPMAP documentation for the algorithm and the NONMEM comparison.
Optimizer Choices
| Optimizer | Description | Recommended For |
|---|---|---|
auto |
Picks nlopt_lbfgs (analytic gradient available) or bobyqa (FD only) per model |
Default; the recommended choice for most fits |
bobyqa |
NLopt BOBYQA — derivative-free quadratic interpolation | Robust on noisy / non-smooth FOCE surfaces (ODE/PD, sparse data, Hill-ridge models); what auto selects there |
slsqp |
Sequential Least Squares Programming (NLopt) | Smooth, well-conditioned analytical PK models where you want gradient-based convergence; pair with reconverge_gradient_interval if it stalls |
nlopt_lbfgs |
NLopt L-BFGS | Analytic-gradient FOCE/FOCEI models (fastest-to-optimum in #490 benchmarking); large parameter spaces; what auto selects there |
lbfgs |
Deprecated alias → nlopt_lbfgs |
— (was the built-in L-BFGS; see #483) |
bfgs |
Deprecated alias → nlopt_lbfgs |
— (was the built-in BFGS) |
mma |
Method of Moving Asymptotes (NLopt) | Constrained problems |
trust_region |
Newton trust-region with Steihaug CG subproblem (argmin) | Well-conditioned problems where second-order curvature helps convergence |
Notes: - auto (the default) chooses between nlopt_lbfgs and bobyqa from the model: when the exact analytic FOCE/FOCEI gradient is available (analytical PK model, in the sensitivity provider’s scope, and gradient not forced to fd) it uses the gradient-based nlopt_lbfgs; otherwise — ODE/PD models, LTBS/SDE, or gradient = fd, where the outer loop runs on finite differences — it falls back to the derivative-free bobyqa. Benchmarking across ~10 real FOCEI datasets (#490) found nlopt_lbfgs fastest-to-optimum on every analytic-gradient problem and bobyqa both fastest and most reliable on the FD-only problems. The fit output records the resolved optimizer as auto (<resolved>). Set optimizer explicitly to override the choice. - bobyqa does not use gradients, so it is robust to small discontinuities in the FOCE surface caused by EBE re-estimation, but it converges more slowly than gradient-based methods on smooth problems. - trust_region uses the analytic outer gradient (same subject_nll_pop_grad as the outer FOCE optimizer) and a BHHH approximate Hessian (H ≈ 4 Σ gᵢgᵢᵀ). The BHHH matrix is always positive semi-definite, so the Steihaug subproblem is well-conditioned even near constraints. The Steihaug CG budget defaults to ceil(sqrt(n_params)) — typically 5 for standard NLME models, which is far cheaper than the previous FD-Hessian path (O(n²) OFV evaluations per Hessian).
Parameter Scaling and EBE Convergence
| Key | Default | Description |
|---|---|---|
parameter_scaling |
auto, none, abs, rescale2 |
auto |
scale_params |
false |
Legacy boolean alias for parameter_scaling = abs. Divide each packed (log/Cholesky) coordinate by its initial magnitude before passing it to the optimizer. Off by default since issue #99: the scaling-enabled path only ever runs on log/Cholesky coordinates, where dividing by \|log value\| is counterproductive — e.g. ln(V)=ln(20)≈3 gets scale 3, turning the optimizer’s unit step into a ≈20× multiplicative jump in V, which overshoots and (via the uniform SLSQP gradient cap) starves the step in other dimensions such as OMEGA, halting short of the minimum. Prefer parameter_scaling = rescale2 for gradient-based optimizers. The OFV value is unchanged at any fixed point, but the optimizer trajectory and stop point are not. |
max_unconverged_frac |
0.1 |
Fraction of subjects (with at least min_obs_for_convergence_check observations) allowed to have unconverged EBEs before the outer optimizer rejects the step (returns OFV = ∞). Set to 1.0 to disable the guard. |
min_obs_for_convergence_check |
2 |
Subjects with fewer than this many observations are excluded from the max_unconverged_frac check (they still run normally). |
stagnation_guard |
true |
Short-circuit the NLopt-based outer optimizers once recent evals show no OFV improvement above 1e-3 over a window of 3*(n+1).max(50) evals. This lets SLSQP / L-BFGS terminate quickly via their own xtol/ftol on numerically-flat plateaus (e.g. γ-bearing FOCEI problems) instead of grinding through the remaining outer_maxiter budget at full inner-loop cost. Set to false to let the optimizer run to its natural termination criterion — useful when debugging or for problems with very slow but real OFV improvements below the threshold. |
ebe_warm_start |
false |
When an inner per-subject EBE solve fails its BFGS step and falls back to Nelder–Mead, warm-start the simplex from the BFGS partial η̂ rather than cold-starting from the prior mode η=0. A weakly-identified η (e.g. an unidentifiable peripheral volume) drives BFGS far out onto the steep prior slope, and NM slides down it to the mode in far fewer iterations than refining from 0 in the flat basin — substantially fewer prediction walks on fallback-heavy fits (≈1.7× on a 2-cpt unidentifiable-V2 benchmark). Opt-in: warm-starting moves the fallback subjects’ EBEs, which perturbs the outer optimiser’s trajectory — harmless for the derivative-free BOBYQA default, but it can derail a gradient-based outer optimiser (e.g. mma) into a worse basin on some models. Enable only after validating OFV/estimates on your model + optimizer; leave false for the historical cold-restart behaviour. |
Options That Don’t Apply to the Selected Method
If you set an option that the chosen estimation method doesn’t consume (e.g. n_convergence with method = focei, or optimizer with method = saem), fit() emits a warning listing the option, the selected method, and the keys that are available for that method. The option is ignored — the fit still proceeds.
For a chained fit (method = [saem, focei]), an option is kept if it applies to any stage in the chain, so SAEM and FOCEI keys can be mixed without warnings.
Multi-Start Optimization
| Key | Default | Description |
|---|---|---|
n_starts |
1 |
Number of independent optimization runs. 1 disables multi-start (no behaviour change). When > 1, all starts run in parallel via rayon; the converged run with the lowest OFV is returned. Start 0 always uses the exact initial values from the model file. |
start_sigma |
0.3 |
Log-space perturbation applied to initial theta values for starts 1..n. Log-packed thetas are multiplied by exp(N(0, start_sigma)); thetas with negative lower bounds are shifted additively. |
multi_start_seed |
42 |
RNG seed for the multi-start theta perturbations. Independent of seed (SAEM) so that changing the SAEM seed does not silently alter which perturbed starting points are used in FOCE multi-start runs. |
Multi-start is most useful for models prone to local minima: nonlinear elimination (Michaelis-Menten), full-block omega, or many covariate parameters. On an 8-core machine n_starts = 8 costs the same wall-clock time as a single run but has ~8× lower probability of a local minimum.
Global Search
Setting global_search = true runs a gradient-free pre-search (NLopt CRS2-LM) before the local optimizer. This helps escape local minima on challenging datasets.
The number of global evaluations is auto-scaled based on the number of parameters and observations, or can be set explicitly with global_maxeval.
Examples
Standard FOCEI with defaults:
[fit_options]
method = focei
maxiter = 300
covariance = true
FOCEI with global search:
[fit_options]
method = focei
maxiter = 500
covariance = true
global_search = true
SAEM with custom settings:
[fit_options]
method = saem
n_exploration = 200
n_convergence = 300
n_mh_steps = 5
seed = 42
covariance = true
FOCEI with SIR uncertainty:
[fit_options]
method = focei
covariance = true
sir = true
sir_samples = 1000
sir_resamples = 250
sir_seed = 42
Derivative-free BOBYQA fit:
[fit_options]
method = foce
optimizer = bobyqa
maxiter = 300
inner_maxiter = 100
inner_tol = 1e-6
Trust-region with tuned CG subproblem:
[fit_options]
method = foce
optimizer = trust_region
maxiter = 200
steihaug_max_iters = 30
FOCE with Inter-Occasion Variability:
[fit_options]
method = foce
iov_column = OCC
covariance = true
Enable optimizer trace and EBE guard:
[fit_options]
method = foce
optimizer_trace = true
max_unconverged_frac = 0.5
min_obs_for_convergence_check = 3
Optimizer Trace
When optimizer_trace = true, a CSV is written to /tmp/ferx_trace_<pid>_<ts>.csv and the path is stored in FitResult::trace_path. Each row is one outer iteration.
| Column | Populated by | Description |
|---|---|---|
iter |
all | Iteration number |
method |
all | foce, focei, gn, gn_hybrid, saem |
phase |
gn_hybrid, saem | focei (polish) or explore/converge |
ofv |
all | Objective function value |
wall_ms |
all | Wall time for this iteration (ms) |
grad_norm |
BFGS, NLopt gradient-mode | Gradient norm |
step_norm |
BFGS | Step size |
inner_iter_count |
(reserved) | Reserved for future per-iteration inner-loop count; currently NA |
optimizer |
FOCE/FOCEI | Active NLopt algorithm |
lm_lambda |
GN | Levenberg-Marquardt damping factor |
ofv_delta |
GN | Change in OFV from previous iteration |
step_accepted |
GN | Whether the GN step was accepted |
cond_nll |
SAEM | Conditional observation NLL |
gamma |
SAEM | SAEM step-size (1 in exploration, 1/k in convergence) |
mh_accept_rate |
SAEM | Mean acceptance rate across all subjects (MH or HMC). In mixed HMC/MH runs (n_leapfrog > 0 with some MH-fallback subjects) this is an aggregate across both samplers. |
n_ebe_unconverged |
FOCE/FOCEI | Subjects whose inner optimizer did not converge |
n_ebe_fallback |
FOCE/FOCEI | Subjects that fell back to Nelder-Mead |
Unused columns contain NA. The trace is buffered and flushed when the fit ends; if a run is killed mid-iteration the buffered tail may be lost.