Fit Options

The optional [fit_options] block configures the estimation method and optimizer settings.

Syntax

[fit_options]
  key = value

General Options

Key Values Default Description
method foce, focei, saem, gn, gn_hybrid, impmap, imp (only as final chain stage) focei Estimation method (or single-stage method in a chain). See chained fits. When neither this key nor a wrapper’s method argument is set, the engine falls back to focei and emits a warning so the implicit choice is visible — set method here to silence it.
maxiter integer 500 Maximum outer loop iterations. maxiter = 0 is evaluation only (NONMEM MAXEVAL=0): the engine runs one inner EBE solve at the initial parameters and reports the objective there with no outer optimisation — useful for scoring a fixed parameter set or computing standard errors at known estimates (with covariance = true). The reported fit is marked not-converged, and the parameters are returned unchanged.
covariance true, false true Compute covariance matrix and standard errors
covariance_method r, s, rsr r Which covariance estimator to assemble, mirroring NONMEM $COVARIANCE MATRIX=. r = inverse Hessian R⁻¹ (model-based; the default). s = inverse score cross-product S⁻¹ (empirical information). rsr = the Huber–White sandwich R⁻¹SR⁻¹ (robust to model mis-specification; NONMEM’s default). ferx defaults to r, but NONMEM’s $COVARIANCE default is rsr — set covariance_method = rsr when reconciling SEs against a NONMEM run that used the default $COV. s/rsr are supported for FOCEI, FOCE, and IOV fits; all three anchored against NONMEM $COV MATRIX=S/RSR within ~10%. All methods require a positive-definite FD Hessian (a non-PD Hessian is not yet routed to the s cross-product). Requires covariance = true.
covariance_fallback none, sir none What to do when the FD Hessian is not positive definite (non-PD). none leaves the covariance step as failed. sir runs SIR with a fallback proposal covariance built from the \|eigenvalue\|-rectified Hessian, inflated 4×; the YAML reports SIR-based 95% CIs instead of Hessian SEs. Requires covariance = true. Ignored when the Hessian succeeds.
fd_hessian_step float 1e-2 Initial relative step size for the finite-difference Hessian used in the covariance step. The actual perturbation for parameter i is fd_hessian_step * (1 + |x_hat[i]|). ferx automatically halves this value up to 8 times if any diagonal stencil evaluates to a non-finite value, so in most cases the default requires no tuning. Set explicitly only if the automatic reduction is insufficient (try 0.1) or if FD noise is the main concern on a very smooth surface (try 1e-3). Must be positive and finite.
optimizer auto, bobyqa, slsqp, nlopt_lbfgs, mma, trust_region (plus deprecated aliases bfgs/lbfgsnlopt_lbfgs) auto Outer-loop optimisation algorithm. Applies to method = foce / focei and to the FOCEI polish phase of method = gn_hybrid. The default auto picks per model: nlopt_lbfgs when the exact analytic FOCE/FOCEI gradient is available, and bobyqa when only finite differences are (ODE/PD models, LTBS/SDE, or gradient = fd) — see the Outer Optimizers page for the applicability matrix and the benchmarking (#490) behind these defaults. The fit output records the resolved pick (e.g. auto (nlopt_lbfgs)). Ignored (with a runtime warning) under method = saem / imp / gn. bfgs and lbfgs are deprecated aliases that now select nlopt_lbfgs (the hand-rolled built-in BFGS/L-BFGS is slated for removal — see #483).
outer_xtol float 1e-4 Relative step tolerance for the derivative-free bobyqa outer optimizer (NLopt xtol_rel): how far its trust radius must shrink to declare success. 1e-4 in scaled log-space is a ~0.01% move on the natural parameter scale. Only affects bobyqa (the auto pick for FD-objective models — ODE/PD, LTBS/SDE, gradient = fd, and TTE); gradient-based optimizers use their own fixed internal tolerances. Must be positive and finite.
outer_ftol float auto Relative objective tolerance for the derivative-free bobyqa outer optimizer (NLopt ftol_rel): the relative OFV change below which it stops. Unset = auto: 1e-8 for a pure-TTE model and 1e-6 otherwise. The TTE tightening fixes #469 — on the near-flat frailty-ω² ridge of a nonlinear hazard parameter (the whole ridge spans <0.01 OFV) the historical 1e-6 stopped the optimizer short, leaving the variance above its own minimum (a Weibull shape-frailty read ω² 0.204 vs the NONMEM/nlmixr2 0.175 consensus; at 1e-8 it lands 0.176). The 1e-6 floor for non-TTE fits is deliberate: a TTE hazard objective is evaluated exactly, but on a noisy objective (ODE solver error, FD-inner FOCE such as LTBS) 1e-8 is unreachable, so BOBYQA grinds toward its maxeval budget instead of converging (≈3× the evaluations on an ODE fit). Set an explicit value to pin it for every model — e.g. outer_ftol = 1e-8 to tighten an LTBS flat-ridge fit, accepting the cost. Must be positive and finite.
inner_maxiter integer 200 Max iterations for the inner (per-subject EBE) optimizer
inner_optimizer auto, bfgs, lbfgs, nelder_mead auto Inner-loop (per-subject EBE) optimisation algorithm. auto uses dense BFGS, switching to limited-memory L-BFGS above 32 random effects. The explicit values pin a single algorithm with no automatic switching: bfgs (dense), lbfgs (limited-memory two-loop recursion — cheaper per step at high random-effect dimension), nelder_mead (derivative-free, for diagnosing gradient issues). All gradient-based variants converge to the same EBE; the choice trades per-step cost against memory. Applies to every method with an inner loop (foce/focei/gn/gn_hybrid/saem/impmap).
inner_tol float 1e-5 Gradient-norm convergence tolerance for the inner (per-subject EBE) optimizer. A looser tolerance leaves residual noise in each subject’s EBE solution, which propagates into the marginal objective the outer optimizer sees; on models with a noisy or flat marginal surface (FD-inner FOCE such as log-transform-both-sides) that noise can make the derivative-free BOBYQA outer optimizer false-converge above the true minimum. 1e-5 removes enough of that noise to match NONMEM’s minimum, at roughly 1.5× the per-fit cost of the older 1e-4 default. (For reference, NONMEM runs its inner conditional optimization far tighter than its outer convergence: the inner precision SIGL defaults to ~10 significant digits, versus the outer NSIG/SIGDIGITS default of 3.) Tighter is not uniformly better: at 1e-6 some ill-conditioned fits over-converge the inner Hessian and the outer optimizer can land in a worse basin. Loosen it (e.g. 1e-4) to speed up well-conditioned fits where it makes no difference; change it further only with validation against a reference fit.
cov_inner_tol float inner_tol (LTBS: min(inner_tol, 1e-8)) Inner EBE-reconvergence tolerance used only by the covariance step, decoupled from inner_tol. The covariance R-matrix is a second-difference of the reconverged OFV and is more sensitive to EBE precision than the fit itself, so on a flat or weakly-identified surface an EBE converged only to the fit’s inner_tol can leave enough residual noise to perturb the standard errors (e.g. the heavily-censored M3 + IOV case in #654 — set cov_inner_tol = 1e-11). This lets the covariance step reconverge tighter without slowing every outer iteration. Unset (default) uses inner_tol for ordinary models (SEs unchanged) and min(inner_tol, 1e-8) for LTBS models, whose g = ln(f) covariance Hessian needs the tighter reconvergence. The default is deliberately not tightened for every model — over-converging some ill-conditioned inner Hessians (e.g. IOV block-Ω) drives the covariance indefinite — so opt in per model with this key.
steihaug_max_iters integer adaptive Max CG iterations for the Steihaug subproblem (only used when optimizer = trust_region). Default (unset) uses ceil(sqrt(n_params)).clamp(5, n_params) — typically 5 for standard NLME models. Set explicitly (e.g. steihaug_max_iters = 50) to pin the budget.
ode_reltol float 1e-4 RK45 ODE solver relative tolerance. ODE models only (ignored for analytical PK). The default reproduces analytical closed forms in PRED to ~1e-4, but the FOCE objective amplifies solver error, so the OFV of an ODE-form model can differ from its analytical equivalent by several units. Set a tighter value (e.g. 1e-10) when the ODE-form OFV must match an analytical reference; expect slower fits.
ode_abstol float 1e-6 RK45 ODE solver absolute tolerance (companion to ode_reltol). ODE models only.
ode_max_steps integer 10000 Max RK45 steps per integration segment. ODE models only. Raise (e.g. 1000000) if a tight ode_reltol exhausts the step budget on stiff multi-compartment systems.
global_search true, false false Run NLopt CRS2-LM (Controlled Random Search with Local Mutation) as a gradient-free global pre-search before the local optimizer. CRS2-LM samples within the parameter bounds; the local optimizer (e.g. bobyqa, slsqp) starts from the best point found. Useful for poorly-identified models — when the local optimizer can land in a degenerate basin (collapsed ETA, V/Q swap, parameters at bounds) from a far-from-truth start, the global pre-search usually escapes it. Adds the pre-search budget on top of the local optimisation, but typically more efficient than running multiple full fits from scratch. Requires a full NLopt build (e.g. brew install nlopt or apt install libnlopt-dev); a clear warning is emitted if CRS2-LM is unavailable.
global_maxeval integer 200 * (n_params + 1) Maximum evaluations of the FOCE objective during the global pre-search. Each eval is a full inner-loop pass over all subjects, so this is the dominant cost of global_search = true. The default (0 → auto) is empirically enough to escape bad basins on 10–20 parameter PK models without dominating the wall time of the subsequent local refine.
bloq_method drop, m3 drop How to handle rows with CENS=1 (below LLOQ) or CENS=-1 (above ULOQ). m3 enables Beal’s M3 likelihood (see BLOQ example).
npde_nsim integer ≥ 0 0 Number of Monte-Carlo replicates per subject used to compute the simulation-based NPDE/NPD diagnostics after the fit. 0 (default) disables the computation — no NPDE/NPD columns are emitted. A typical value is 1000. Cost scales linearly with the count.
npde_seed integer RNG seed for the NPDE/NPD simulation, for reproducible diagnostics. Unset falls back to a fixed default. Only used when npde_nsim > 0.
mu_referencing true, false true Re-centre inner-loop ETA estimates on the current population mean (auto-detected from [individual_parameters]). See the FAQ entry for details. Set false to reproduce pre-automatic-mu behaviour.
iov_column string Name of the occasion column in the dataset (e.g. OCC). Required when the model uses kappa or block_kappa declarations. The column must contain integer occasion indices. Case-insensitive. Only supported with foce / focei — not saem. See IOV documentation.
optimizer_trace true, false false Write a per-iteration CSV to /tmp/ferx_trace_<pid>_<ts>.csv. The path is stored in FitResult::trace_path. Useful for diagnosing convergence problems or comparing optimizers. See Optimizer Trace.
reconverge_gradient_interval integer ≥ 0 0 How often to re-solve each subject’s inner EBE loop during the population gradient instead of holding the EBEs (η̂) and FOCE Hessian fixed. 0 (default) never reconverges — the cheap fixed-EBE gradient is used. 1 reconverges on every gradient evaluation; N reconverges on evaluations 0, N, 2N, … and uses the cheap gradient in between (amortizing the cost by ~N× while still periodically correcting the search direction). The fixed-EBE gradient omits the inner solution’s response to θ/Ω, so on ill-conditioned fits a gradient optimizer (e.g. slsqp) can stall well above the derivative-free (bobyqa) optimum; reconverging recovers the full surface at roughly 5–6× the per-gradient cost at interval 1. IOV models (kappa/block_kappa) always reconverge and ignore this setting. See the FOCE/FOCEI page.
inits_from_nca true, false, nca, nca_sweep, nca_ebe false Derive NCA-based starting values from the data before the optimizer loop. true (alias for nca_sweep) and false/off toggle the default strategy; the three named values pick a strategy explicitly (see NCA-based starting values). Fixed thetas are never overwritten; covariate-effect thetas (no mu-referencing link) keep the model default. Most useful with method = trust_region or method = gn / method = gn_hybrid, where bad starting values can cause early stalling. The same estimation is available without running a fit via the CLI flag --inits-from-nca[=METHOD] and (in ferx-r) the ferx_inits_from_nca() function.

NCA-based starting values

inits_from_nca estimates starting values directly from the data using non-compartmental analysis (NCA), then optionally refines parameters NCA cannot estimate. All three strategies are NCA-based; they differ only in how much refinement runs on top:

Value Strategy What it does Typical cost
nca NCA only Per-subject NCA (AUC, terminal slope, Wagner–Nelson Ka, biexponential peeling for 2/3-cpt) pooled to population geometric means. Leaves parameters NCA can’t reach (peripheral Q/V2, all ODE/PD thetas) at the model default. < 5 ms
nca_sweep NCA + sweep Runs nca, then sweeps every remaining non-fixed theta over a log-space rRMSE grid using population predictions (etas = 0). Model-agnostic — also covers ODE/PD models. This is what true selects. < 500 ms (analytical)
nca_ebe NCA + EBE sweep Like nca_sweep but evaluates the grid with empirical Bayes estimates (etas ≠ 0); more accurate under large IIV (omega > ~0.2). Falls back to nca_sweep for ODE models. < 500 ms (analytical)

The CL eta’s omega is also updated from inter-subject CV² when ≥ 3 subjects have a valid NCA estimate.

When nca_sweep is enabled but the fit fails to converge or the OFV looks suspiciously high, try nca_ebe.

Simulation-based diagnostics (NPDE / NPD)

Setting npde_nsim > 0 adds two simulation-based goodness-of-fit columns to the sdtab output: NPD (Normalized Prediction Discrepancies) and NPDE (Normalized Prediction Distribution Errors). Unlike CWRES — which linearises the model around the conditional mode and so inherits the bias of a first-order approximation — NPDE/NPD are built entirely from Monte-Carlo simulation under the fitted model, so they are robust to model nonlinearity and to non-Gaussian random effects (Brendel et al. 2006; Comets et al. 2008, the npde R package).

[fit_options]
method = focei
npde_nsim = 1000     # replicates per subject (off when 0, the default)
npde_seed = 12345    # optional, for reproducibility

The effective seed actually used — the explicit npde_seed when given, otherwise the built-in default — is recorded as model: npde_seed: in the {model}-fit.yaml output (and round-trips through .fitrx), so the NPDE/NPD columns can always be regenerated from the saved fit, even when no seed was set in the model file. The line is omitted when NPDE did not run (npde_nsim = 0).

For each observation y_ij, ferx simulates K = npde_nsim replicates under the fitted θ/Ω/Σ (sampling η ~ N(0, Ω) and the residual error), evaluated at the subject’s own observation design:

  • NPD (no decorrelation): pd_ij is the empirical CDF of the simulated values at observation ij, and npd_ij = Φ⁻¹(pd_ij).
  • NPDE (decorrelated): within each subject the observed and simulated vectors are decorrelated with the empirical mean and Cholesky factor of the simulated covariance (the Brendel/Comets procedure) before the empirical CDF and inverse- normal transform. This removes the within-subject correlation that NPD ignores.

Empirical-CDF probabilities at 0 or 1 are clamped to [1/(2K), 1 − 1/(2K)] before Φ⁻¹, matching the npde-package convention, so the scores stay finite.

Under a correctly specified model, NPDE follows a standard normal distribution. On the warfarin example (examples/warfarin.ferx, data/warfarin.csv), a converged FOCEI fit with npde_nsim = 1000 gives the expected mean-zero, unit-variance distribution:

Diagnostic mean variance reference
NPDE 0.006 1.02 ≈ N(0, 1)
NPD −0.002 0.88 mean ≈ 0; variance < 1 because NPD retains within-subject correlation

This matches the npde R package’s autonpde() (same Cholesky decorrelation and edge-clamping) to Monte-Carlo noise. See tests/npde_validation.rs for the engine-side check and an R reproduction recipe.

NONMEM cross-validation

NONMEM has no native NPDE output; the reference pipeline is a NONMEM $SIMULATION post-processed by the npde R package. The same warfarin data was fit with NONMEM 7.5.1 (ADVAN2 TRANS2, METHOD=1 INTER, proportional error) and the two fits agree, so any difference in the diagnostics is attributable to the NPDE computation rather than the fit:

Quantity ferx (FOCEI) NONMEM 7.5.1
TVCL 0.1328 0.1327
TVV 7.738 7.738
TVKA 0.817 0.811
OFV −286.0015 −286.0042

Feeding a 1000-replicate NONMEM $SIMULATION (under the NONMEM estimates) and the observed data to npde::autonpde() (Cholesky decorrelation, edge-clamping) reproduces ferx’s NPDE/NPD on the same 110 observations:

Diagnostic ferx mean ferx var npde-pkg mean npde-pkg var
NPDE 0.005 1.09 0.036 1.04
NPD −0.009 0.91 −0.004 0.91

The NPD variance (the quantity that should sit below 1 because NPD retains the within-subject correlation) matches to three figures (0.91 vs 0.91); the small NPDE-variance gap is Monte-Carlo noise between two independent simulation draws. Both engines give NPDE ≈ N(0, 1) and a mean-zero NPD, as expected under this well-specified model.

Limitations. M3/BLQ censored observations need the predictive-CDF variant and are out of scope: censored rows (CENS != 0) are emitted as empty/NaN (matching the CWRES/IWRES convention), and a subject’s NPDE is NaN as a whole when it has any censored row, since the within-subject decorrelation would otherwise mix the LLOQ into the uncensored rows. NPDE also requires more replicates than a subject has observations (npde_nsim > n_obs) for a full-rank simulated covariance; otherwise that subject’s NPDE is NaN (NPD is still computed). For IOV (kappa) models the simulation holds kappas at zero (the existing simulate() convention), so NPDE/NPD omit the inter-occasion component of the predictive variance.

Estimation Methods

FOCEI (default)

method = focei

FOCE with Interaction. Includes the dependence of the residual error on random effects. More accurate than FOCE when the error model depends on individual predictions, but slightly slower.

FOCE

method = foce

First-Order Conditional Estimation. Linearizes the model around the empirical Bayes estimates. Fast and reliable for most models.

SAEM

method = saem

Stochastic Approximation EM. Uses Metropolis-Hastings sampling instead of MAP optimization for random effects. More robust to local minima; recommended for complex models with many random effects.

SAEM-Specific Options

Key Default Description
n_exploration 150 Phase 1 iterations (step size = 1)
n_convergence 250 Phase 2 iterations (step size = 1/k)
n_mh_steps 20 Block Metropolis-Hastings steps per subject per iteration. Also sizes the componentwise decorrelating kernel that prevents block-Ω collapse (max(2, n_mh_steps / n_eta) sweeps; multi-η models only — skipped when n_eta < 2). When n_leapfrog > 0, this applies to subjects that fall back to MH (see below); HMC subjects use one proposal per iteration regardless.
n_leapfrog 0 Leapfrog steps per HMC proposal (0 = use MH; see below). When > 0, subjects for which HMC is unavailable (ODE model, missing analytical PK path, non-finite Ω, unsupported TV-cov path) fall back to MH using n_mh_steps proposals.
adapt_interval 50 Iterations between step-size adaptation
omega_burnin 20 Initial exploration iterations during which Ω (and ΩIOV) are held at their starting values while the MH chain warms up. Clamped to n_exploration; set 0 to disable. Prevents the Ω collapse described in the SAEM page.
seed 12345 RNG seed for reproducibility
conddist false Run a post-fit conditional-distribution pass estimating each subject’s p(η_i \| y_i) by MCMC. Reports per-subject conditional mean/SD and distribution-based η-shrinkage (the shrinkage-unbiased basis for η diagnostics). See the SAEM page.
conddist_nsamp 200 Retained MCMC draws per subject in the conditional-distribution pass. Larger tightens the mean/SD estimates at linear cost. Production diagnostics want values in the thousands. Only used when conddist = true.
conddist_burnin 20 Burn-in draws discarded before accumulation, to forget the EBE-mode warm start. Only used when conddist = true.
conddist_keep_samples false Retain the raw per-subject draws (written to {model}-conddist-samples.csv), not just the mean/SD. Only used when conddist = true.

SIR (Sampling Importance Resampling)

SIR provides non-parametric parameter uncertainty estimates as an optional post-estimation step. Requires covariance = true.

Key Default Description
sir false Enable SIR uncertainty estimation
sir_samples 1000 Number of proposal samples (M)
sir_resamples 250 Number of resampled vectors (m)
sir_seed 12345 RNG seed for reproducibility
sir_keep_samples false Retain resampled parameter vectors for simulate_with_uncertainty()
sir_df 5.0 Degrees of freedom for the Student-t proposal; higher values approach a normal proposal

See SIR documentation for details.

Importance Sampling (IMP)

By default imp is a Monte-Carlo EM estimator (NONMEM METHOD=IMP): it maximises the importance-sampled marginal likelihood, updating θ/Ω/σ each iteration. Set imp_eval_only = true (NONMEM EONLY=1) to instead evaluate the marginal −2 log L at the fixed input parameters — a lower-bias −2 log L than the FOCE/Laplace OFV when subject posteriors of η are non-Gaussian (e.g. sparsely-sampled PK).

Behaviour change: imp previously only evaluated −2 log L. It now estimates by default. Add imp_eval_only = true to recover the old behaviour.

[fit_options]
  method        = imp            # estimate (NONMEM METHOD=IMP)
  imp_iterations = 200
  imp_samples    = 1000
  imp_averaging  = 50
  imp_proposal_df = 5             # or `normal` for a multivariate-normal proposal
  imp_seed       = 12345
[fit_options]
  method        = [focei, imp]   # evaluate FOCEI's fit (NONMEM EONLY=1)
  imp_eval_only  = true

On rich data prefer impmap or warm-start with [focei, imp]; plain imp’s one-iteration-lagged proposal can collapse the ESS on a sharp posterior. See the IMP documentation.

Key Default Description
imp_eval_only false true ⇒ evaluate −2 log L at fixed parameters (NONMEM EONLY=1); must be the terminal chain stage. false ⇒ estimate (NONMEM METHOD=IMP).
imp_iterations 200 MCEM iterations (estimator only).
imp_averaging 50 Terminal iterations averaged into the reported estimate (estimator only).
imp_samples 1000 Importance samples K per subject. 2000–5000 recommended for publication-quality MC SE.
imp_proposal_df 5.0 Student-t proposal degrees of freedom (≥ 1), or normal/mvn for a multivariate-normal proposal. Lower = heavier tails.
imp_auto true Adaptive sample count (NONMEM AUTO). When true, imp_samples is the starting count and is ramped up (×2/iteration, cap 10000) while the objective’s Monte-Carlo SE exceeds 1.0. Recommended for high-dimensional / FREM models, where a fixed count biases the M-step.
imp_seed 12345 RNG seed. Same seed → identical result.
imp_low_ess_threshold 0.1 Subjects with normalized ESS below this fraction get flagged in the result. Set 0 to silence.
imp_defensive_alpha 0.0 Defensive-mixture weight (issue #528), opt-in. Each subject draws this fraction of its samples from the prior N(0, Ω) rather than the mode-centred proposal, and every sample is scored under the mixture density. This bounds the importance weights so a weakly-identified subject — e.g. an analytical [initial_conditions] baseline whose V cancels in the amplitude — can’t hijack the weighted M-step and walk θ to the bounds. Must be in [0, 1); the default 0 is the legacy single-proposal sampler (bit-comparable with NONMEM), set a small positive value (e.g. 0.1) to enable the rescue. Applies to imp and impmap (including the FREM Rao-Blackwell path); for an impmap stage you may also write impmap_defensive_alpha. Enabling it disables Sobol QMC and raises the per-subject ESS floor (so imp_low_ess_threshold flags fewer subjects). See Importance Sampling → Defensive mixture.

See Importance Sampling documentation for the algorithm, the NONMEM mapping, IOV caveats, and tuning guidance.

IMPMAP (importance_sampling_map)

Like the estimating imp, impmap is a Monte-Carlo EM estimator — equivalent to NONMEM METHOD=IMPMAP. The difference is the proposal: impmap re-centers at the freshly-computed conditional mode (MAP) every iteration (robust on rich data), whereas imp re-centers from the previous iteration’s sample moments (cheaper, but fragile on rich data). Both update θ/Ω/σ from the importance-weighted posterior moments. It runs standalone or as a chain stage:

[fit_options]
  method             = importance_sampling_map   # alias: impmap
  impmap_iterations  = 200
  impmap_samples     = 300
  impmap_proposal_df = 4          # Student-t (default); `normal` for MVN
  impmap_seed        = 12345
Key Default Description
impmap_iterations 200 Number of MCEM iterations (parameter updates).
impmap_samples 300 Importance samples K per subject per iteration. Larger K reduces Monte-Carlo noise at linear cost.
impmap_proposal_df 4 Proposal degrees of freedom. A finite value ≥ 1 gives a heavier-tailed Student-t (default 4); normal (or mvn) gives a multivariate-normal proposal (NONMEM’s default). The Gaussian’s lighter tails under-cover the posterior of weakly-identified parameters and bias the M-step moments, so ferx defaults to a Student-t.
impmap_auto true Adaptive sample count (NONMEM AUTO). When true, impmap_samples is the starting count and is ramped up (×2/iteration, cap 10000) while the objective’s Monte-Carlo SE exceeds 1.0 (NONMEM STDOBJ). Strongly recommended for FREM / high-dimensional models — a fixed count leaves a sample-count-dependent bias in the typical-value and Ω estimates.
impmap_averaging 50 Final iterations whose parameters are averaged into the reported estimate (Monte-Carlo variance reduction).
impmap_seed 12345 RNG seed. Same seed → identical estimates.
impmap_low_ess_threshold 0.1 Subjects with normalized ESS below this fraction are flagged as poorly sampled.
impmap_trace false When true, collect per-iteration parameter values into FitResult.impmap_trace — analogous to NONMEM .ext output for traceplots.
impmap_mceta 0 Number of additional random starting points for per-subject MAP optimization (analogous to NONMEM MCETA). Each start draws η from N(0, Ω). The start with the lowest individual NLL wins. 0 = single warm-start (default). 3 is a good choice for high-dimensional models (e.g. FREM with ≥ 5 ETAs).
impmap_sobol false Use Sobol quasi-random sequences (with Cranley-Patterson randomization) for IS draws instead of pseudo-random. Gives more uniform posterior coverage with fewer samples. Only applies to MVN proposals (impmap_proposal_df = normal); Student-t falls back to pseudo-random.
iscale_min 0.1 Minimum proposal scaling factor for adaptive IS (NONMEM ISCALE_MIN). The IS proposal covariance is multiplied by where s is chosen from [iscale_min, iscale_max] to maximise per-subject ESS. Set both to 1.0 to disable.
iscale_max 10.0 Maximum proposal scaling factor (NONMEM ISCALE_MAX).
frem_rao_blackwell true FREM only: Rao-Blackwellise the covariate ETAs (integrate them analytically, importance-sample only the PK ETAs). Strongly recommended — brute-force sampling of the near-singular covariate dimensions has very poor ESS. Set false only to diagnose the RB path against the full-dimensional sampler.

impmap reuses inner_maxiter / inner_tol for the per-iteration MAP step. Inter-occasion variability ([iov] / kappa) and SDE ([diffusion]) models are not yet supported and are refused up front. See the IMPMAP documentation for the algorithm and the NONMEM comparison.

Optimizer Choices

Optimizer Description Recommended For
auto Picks nlopt_lbfgs (analytic gradient available) or bobyqa (FD only) per model Default; the recommended choice for most fits
bobyqa NLopt BOBYQA — derivative-free quadratic interpolation Robust on noisy / non-smooth FOCE surfaces (ODE/PD, sparse data, Hill-ridge models); what auto selects there
slsqp Sequential Least Squares Programming (NLopt) Smooth, well-conditioned analytical PK models where you want gradient-based convergence; pair with reconverge_gradient_interval if it stalls
nlopt_lbfgs NLopt L-BFGS Analytic-gradient FOCE/FOCEI models (fastest-to-optimum in #490 benchmarking); large parameter spaces; what auto selects there
lbfgs Deprecated aliasnlopt_lbfgs — (was the built-in L-BFGS; see #483)
bfgs Deprecated aliasnlopt_lbfgs — (was the built-in BFGS)
mma Method of Moving Asymptotes (NLopt) Constrained problems
trust_region Newton trust-region with Steihaug CG subproblem (argmin) Well-conditioned problems where second-order curvature helps convergence

Notes: - auto (the default) chooses between nlopt_lbfgs and bobyqa from the model: when the exact analytic FOCE/FOCEI gradient is available (analytical PK model, in the sensitivity provider’s scope, and gradient not forced to fd) it uses the gradient-based nlopt_lbfgs; otherwise — ODE/PD models, LTBS/SDE, or gradient = fd, where the outer loop runs on finite differences — it falls back to the derivative-free bobyqa. Benchmarking across ~10 real FOCEI datasets (#490) found nlopt_lbfgs fastest-to-optimum on every analytic-gradient problem and bobyqa both fastest and most reliable on the FD-only problems. The fit output records the resolved optimizer as auto (<resolved>). Set optimizer explicitly to override the choice. - bobyqa does not use gradients, so it is robust to small discontinuities in the FOCE surface caused by EBE re-estimation, but it converges more slowly than gradient-based methods on smooth problems. - trust_region uses the analytic outer gradient (same subject_nll_pop_grad as the outer FOCE optimizer) and a BHHH approximate Hessian (H ≈ 4 Σ gᵢgᵢᵀ). The BHHH matrix is always positive semi-definite, so the Steihaug subproblem is well-conditioned even near constraints. The Steihaug CG budget defaults to ceil(sqrt(n_params)) — typically 5 for standard NLME models, which is far cheaper than the previous FD-Hessian path (O(n²) OFV evaluations per Hessian).

Parameter Scaling and EBE Convergence

Key Default Description
parameter_scaling auto, none, abs, rescale2 auto
scale_params false Legacy boolean alias for parameter_scaling = abs. Divide each packed (log/Cholesky) coordinate by its initial magnitude before passing it to the optimizer. Off by default since issue #99: the scaling-enabled path only ever runs on log/Cholesky coordinates, where dividing by \|log value\| is counterproductive — e.g. ln(V)=ln(20)≈3 gets scale 3, turning the optimizer’s unit step into a ≈20× multiplicative jump in V, which overshoots and (via the uniform SLSQP gradient cap) starves the step in other dimensions such as OMEGA, halting short of the minimum. Prefer parameter_scaling = rescale2 for gradient-based optimizers. The OFV value is unchanged at any fixed point, but the optimizer trajectory and stop point are not.
max_unconverged_frac 0.1 Fraction of subjects (with at least min_obs_for_convergence_check observations) allowed to have unconverged EBEs before the outer optimizer rejects the step (returns OFV = ∞). Set to 1.0 to disable the guard.
min_obs_for_convergence_check 2 Subjects with fewer than this many observations are excluded from the max_unconverged_frac check (they still run normally).
stagnation_guard true Short-circuit the NLopt-based outer optimizers once recent evals show no OFV improvement above 1e-3 over a window of 3*(n+1).max(50) evals. This lets SLSQP / L-BFGS terminate quickly via their own xtol/ftol on numerically-flat plateaus (e.g. γ-bearing FOCEI problems) instead of grinding through the remaining outer_maxiter budget at full inner-loop cost. Set to false to let the optimizer run to its natural termination criterion — useful when debugging or for problems with very slow but real OFV improvements below the threshold.
ebe_warm_start false When an inner per-subject EBE solve fails its BFGS step and falls back to Nelder–Mead, warm-start the simplex from the BFGS partial η̂ rather than cold-starting from the prior mode η=0. A weakly-identified η (e.g. an unidentifiable peripheral volume) drives BFGS far out onto the steep prior slope, and NM slides down it to the mode in far fewer iterations than refining from 0 in the flat basin — substantially fewer prediction walks on fallback-heavy fits (≈1.7× on a 2-cpt unidentifiable-V2 benchmark). Opt-in: warm-starting moves the fallback subjects’ EBEs, which perturbs the outer optimiser’s trajectory — harmless for the derivative-free BOBYQA default, but it can derail a gradient-based outer optimiser (e.g. mma) into a worse basin on some models. Enable only after validating OFV/estimates on your model + optimizer; leave false for the historical cold-restart behaviour.

Options That Don’t Apply to the Selected Method

If you set an option that the chosen estimation method doesn’t consume (e.g. n_convergence with method = focei, or optimizer with method = saem), fit() emits a warning listing the option, the selected method, and the keys that are available for that method. The option is ignored — the fit still proceeds.

For a chained fit (method = [saem, focei]), an option is kept if it applies to any stage in the chain, so SAEM and FOCEI keys can be mixed without warnings.

Multi-Start Optimization

Key Default Description
n_starts 1 Number of independent optimization runs. 1 disables multi-start (no behaviour change). When > 1, all starts run in parallel via rayon; the converged run with the lowest OFV is returned. Start 0 always uses the exact initial values from the model file.
start_sigma 0.3 Log-space perturbation applied to initial theta values for starts 1..n. Log-packed thetas are multiplied by exp(N(0, start_sigma)); thetas with negative lower bounds are shifted additively.
multi_start_seed 42 RNG seed for the multi-start theta perturbations. Independent of seed (SAEM) so that changing the SAEM seed does not silently alter which perturbed starting points are used in FOCE multi-start runs.

Multi-start is most useful for models prone to local minima: nonlinear elimination (Michaelis-Menten), full-block omega, or many covariate parameters. On an 8-core machine n_starts = 8 costs the same wall-clock time as a single run but has ~8× lower probability of a local minimum.

Examples

Standard FOCEI with defaults:

[fit_options]
  method     = focei
  maxiter    = 300
  covariance = true

FOCEI with global search:

[fit_options]
  method        = focei
  maxiter       = 500
  covariance    = true
  global_search = true

SAEM with custom settings:

[fit_options]
  method        = saem
  n_exploration = 200
  n_convergence = 300
  n_mh_steps    = 5
  seed          = 42
  covariance    = true

FOCEI with SIR uncertainty:

[fit_options]
  method        = focei
  covariance    = true
  sir           = true
  sir_samples   = 1000
  sir_resamples = 250
  sir_seed      = 42

Derivative-free BOBYQA fit:

[fit_options]
  method        = foce
  optimizer     = bobyqa
  maxiter       = 300
  inner_maxiter = 100
  inner_tol     = 1e-6

Trust-region with tuned CG subproblem:

[fit_options]
  method             = foce
  optimizer          = trust_region
  maxiter            = 200
  steihaug_max_iters = 30

FOCE with Inter-Occasion Variability:

[fit_options]
  method     = foce
  iov_column = OCC
  covariance = true

Enable optimizer trace and EBE guard:

[fit_options]
  method                        = foce
  optimizer_trace               = true
  max_unconverged_frac          = 0.5
  min_obs_for_convergence_check = 3

Optimizer Trace

When optimizer_trace = true, a CSV is written to /tmp/ferx_trace_<pid>_<ts>.csv and the path is stored in FitResult::trace_path. Each row is one outer iteration.

Column Populated by Description
iter all Iteration number
method all foce, focei, gn, gn_hybrid, saem
phase gn_hybrid, saem focei (polish) or explore/converge
ofv all Objective function value
wall_ms all Wall time for this iteration (ms)
grad_norm BFGS, NLopt gradient-mode Gradient norm
step_norm BFGS Step size
inner_iter_count (reserved) Reserved for future per-iteration inner-loop count; currently NA
optimizer FOCE/FOCEI Active NLopt algorithm
lm_lambda GN Levenberg-Marquardt damping factor
ofv_delta GN Change in OFV from previous iteration
step_accepted GN Whether the GN step was accepted
cond_nll SAEM Conditional observation NLL
gamma SAEM SAEM step-size (1 in exploration, 1/k in convergence)
mh_accept_rate SAEM Mean acceptance rate across all subjects (MH or HMC). In mixed HMC/MH runs (n_leapfrog > 0 with some MH-fallback subjects) this is an aggregate across both samplers.
n_ebe_unconverged FOCE/FOCEI Subjects whose inner optimizer did not converge
n_ebe_fallback FOCE/FOCEI Subjects that fell back to Nelder-Mead

Unused columns contain NA. The trace is buffered and flushed when the fit ends; if a run is killed mid-iteration the buffered tail may be lost.