Fit Options

The optional [fit_options] block configures the estimation method and optimizer settings.

Syntax

[fit_options]
  key = value

General Options

Key	Values	Default	Description
`method`	`foce`, `focei`, `saem`, `gn`, `gn_hybrid`, `impmap`, `imp` (only as final chain stage)	`focei`	Estimation method (or single-stage method in a chain). See chained fits. When neither this key nor a wrapper’s `method` argument is set, the engine falls back to `focei` and emits a warning so the implicit choice is visible — set `method` here to silence it.
`maxiter`	integer	`500`	Maximum outer loop iterations. `maxiter = 0` is evaluation only (NONMEM `MAXEVAL=0`): the engine runs one inner EBE solve at the initial parameters and reports the objective there with no outer optimisation — useful for scoring a fixed parameter set or computing standard errors at known estimates (with `covariance = true`). The reported fit is marked not-converged, and the parameters are returned unchanged.
`covariance`	`true`, `false`	`true`	Compute covariance matrix and standard errors
`covariance_method`	`r`, `s`, `rsr`	`r`	Which covariance estimator to assemble, mirroring NONMEM `$COVARIANCE MATRIX=`. `r` = inverse Hessian `R⁻¹` (model-based; the default). `s` = inverse score cross-product `S⁻¹` (empirical information). `rsr` = the Huber–White sandwich `R⁻¹SR⁻¹` (robust to model mis-specification; NONMEM’s default). ferx defaults to `r`, but NONMEM’s `$COVARIANCE` default is `rsr` — set `covariance_method = rsr` when reconciling SEs against a NONMEM run that used the default `$COV`. `s`/`rsr` are supported for FOCEI, FOCE, and IOV fits; all three anchored against NONMEM `$COV MATRIX=S`/`RSR` within ~10%. All methods require a positive-definite FD Hessian (a non-PD Hessian is not yet routed to the `s` cross-product). Requires `covariance = true`.
`covariance_fallback`	`none`, `sir`	`none`	What to do when the FD Hessian is not positive definite (non-PD). `none` leaves the covariance step as failed. `sir` runs SIR with a fallback proposal covariance built from the `\\|eigenvalue\\|`-rectified Hessian, inflated 4×; the YAML reports SIR-based 95% CIs instead of Hessian SEs. Requires `covariance = true`. Ignored when the Hessian succeeds.
`fd_hessian_step`	float	`1e-2`	Initial relative step size for the finite-difference Hessian used in the covariance step. The actual perturbation for parameter `i` is `fd_hessian_step * (1 + \|x_hat[i]\|)`. ferx automatically halves this value up to 8 times if any diagonal stencil evaluates to a non-finite value, so in most cases the default requires no tuning. Set explicitly only if the automatic reduction is insufficient (try `0.1`) or if FD noise is the main concern on a very smooth surface (try `1e-3`). Must be positive and finite.
`optimizer`	`auto`, `bobyqa`, `slsqp`, `nlopt_lbfgs`, `mma`, `trust_region` (plus deprecated aliases `bfgs`/`lbfgs` → `nlopt_lbfgs`)	`auto`	Outer-loop optimisation algorithm. Applies to `method = foce` / `focei` and to the FOCEI polish phase of `method = gn_hybrid`. The default `auto` picks per model: `nlopt_lbfgs` when the exact analytic FOCE/FOCEI gradient is available, and `bobyqa` when only finite differences are (ODE/PD models, LTBS/SDE, or `gradient = fd`) — see the Outer Optimizers page for the applicability matrix and the benchmarking (#490) behind these defaults. The fit output records the resolved pick (e.g. `auto (nlopt_lbfgs)`). Ignored (with a runtime warning) under `method = saem` / `imp` / `gn`. `bfgs` and `lbfgs` are deprecated aliases that now select `nlopt_lbfgs` (the hand-rolled built-in BFGS/L-BFGS is slated for removal — see #483).
`outer_xtol`	float	`1e-4`	Relative step tolerance for the derivative-free `bobyqa` outer optimizer (NLopt `xtol_rel`): how far its trust radius must shrink to declare success. `1e-4` in scaled log-space is a ~0.01% move on the natural parameter scale. Only affects `bobyqa` (the `auto` pick for FD-objective models — ODE/PD, LTBS/SDE, `gradient = fd`, and TTE); gradient-based optimizers use their own fixed internal tolerances. Must be positive and finite.
`outer_ftol`	float	auto	Relative objective tolerance for the derivative-free `bobyqa` outer optimizer (NLopt `ftol_rel`): the relative OFV change below which it stops. Unset = auto: `1e-8` for a pure-TTE model and `1e-6` otherwise. The TTE tightening fixes #469 — on the near-flat frailty-ω² ridge of a nonlinear hazard parameter (the whole ridge spans <0.01 OFV) the historical `1e-6` stopped the optimizer short, leaving the variance above its own minimum (a Weibull shape-frailty read ω² 0.204 vs the NONMEM/nlmixr2 0.175 consensus; at `1e-8` it lands 0.176). The `1e-6` floor for non-TTE fits is deliberate: a TTE hazard objective is evaluated exactly, but on a noisy objective (ODE solver error, FD-inner FOCE such as LTBS) `1e-8` is unreachable, so BOBYQA grinds toward its maxeval budget instead of converging (≈3× the evaluations on an ODE fit). Set an explicit value to pin it for every model — e.g. `outer_ftol = 1e-8` to tighten an LTBS flat-ridge fit, accepting the cost. Must be positive and finite.
`inner_maxiter`	integer	`200`	Max iterations for the inner (per-subject EBE) optimizer
`inner_optimizer`	`auto`, `bfgs`, `lbfgs`, `nelder_mead`	`auto`	Inner-loop (per-subject EBE) optimisation algorithm. `auto` uses dense BFGS, switching to limited-memory L-BFGS above 32 random effects. The explicit values pin a single algorithm with no automatic switching: `bfgs` (dense), `lbfgs` (limited-memory two-loop recursion — cheaper per step at high random-effect dimension), `nelder_mead` (derivative-free, for diagnosing gradient issues). All gradient-based variants converge to the same EBE; the choice trades per-step cost against memory. Applies to every method with an inner loop (`foce`/`focei`/`gn`/`gn_hybrid`/`saem`/`impmap`).
`inner_tol`	float	`1e-5`	Gradient-norm convergence tolerance for the inner (per-subject EBE) optimizer. A looser tolerance leaves residual noise in each subject’s EBE solution, which propagates into the marginal objective the outer optimizer sees; on models with a noisy or flat marginal surface (FD-inner FOCE such as log-transform-both-sides) that noise can make the derivative-free BOBYQA outer optimizer false-converge above the true minimum. `1e-5` removes enough of that noise to match NONMEM’s minimum, at roughly 1.5× the per-fit cost of the older `1e-4` default. (For reference, NONMEM runs its inner conditional optimization far tighter than its outer convergence: the inner precision `SIGL` defaults to ~10 significant digits, versus the outer `NSIG`/SIGDIGITS default of 3.) Tighter is not uniformly better: at `1e-6` some ill-conditioned fits over-converge the inner Hessian and the outer optimizer can land in a worse basin. Loosen it (e.g. `1e-4`) to speed up well-conditioned fits where it makes no difference; change it further only with validation against a reference fit.
`cov_inner_tol`	float	`inner_tol` (LTBS: `min(inner_tol, 1e-8)`)	Inner EBE-reconvergence tolerance used only by the covariance step, decoupled from `inner_tol`. The covariance R-matrix is a second-difference of the reconverged OFV and is more sensitive to EBE precision than the fit itself, so on a flat or weakly-identified surface an EBE converged only to the fit’s `inner_tol` can leave enough residual noise to perturb the standard errors (e.g. the heavily-censored M3 + IOV case in #654 — set `cov_inner_tol = 1e-11`). This lets the covariance step reconverge tighter without slowing every outer iteration. Unset (default) uses `inner_tol` for ordinary models (SEs unchanged) and `min(inner_tol, 1e-8)` for LTBS models, whose `g = ln(f)` covariance Hessian needs the tighter reconvergence. The default is deliberately not tightened for every model — over-converging some ill-conditioned inner Hessians (e.g. IOV block-Ω) drives the covariance indefinite — so opt in per model with this key.
`steihaug_max_iters`	integer	adaptive	Max CG iterations for the Steihaug subproblem (only used when `optimizer = trust_region`). Default (unset) uses `ceil(sqrt(n_params)).clamp(5, n_params)` — typically 5 for standard NLME models. Set explicitly (e.g. `steihaug_max_iters = 50`) to pin the budget.
`ode_reltol`	float	`1e-4`	RK45 ODE solver relative tolerance. ODE models only (ignored for analytical PK). The default reproduces analytical closed forms in PRED to ~`1e-4`, but the FOCE objective amplifies solver error, so the OFV of an ODE-form model can differ from its analytical equivalent by several units. Set a tighter value (e.g. `1e-10`) when the ODE-form OFV must match an analytical reference; expect slower fits.
`ode_abstol`	float	`1e-6`	RK45 ODE solver absolute tolerance (companion to `ode_reltol`). ODE models only.
`ode_max_steps`	integer	`10000`	Max RK45 steps per integration segment. ODE models only. Raise (e.g. `1000000`) if a tight `ode_reltol` exhausts the step budget on stiff multi-compartment systems.
`global_search`	`true`, `false`	`false`	Run NLopt CRS2-LM (Controlled Random Search with Local Mutation) as a gradient-free global pre-search before the local optimizer. CRS2-LM samples within the parameter bounds; the local optimizer (e.g. `bobyqa`, `slsqp`) starts from the best point found. Useful for poorly-identified models — when the local optimizer can land in a degenerate basin (collapsed ETA, V/Q swap, parameters at bounds) from a far-from-truth start, the global pre-search usually escapes it. Adds the pre-search budget on top of the local optimisation, but typically more efficient than running multiple full fits from scratch. Requires a full NLopt build (e.g. `brew install nlopt` or `apt install libnlopt-dev`); a clear warning is emitted if CRS2-LM is unavailable.
`global_maxeval`	integer	`200 * (n_params + 1)`	Maximum evaluations of the FOCE objective during the global pre-search. Each eval is a full inner-loop pass over all subjects, so this is the dominant cost of `global_search = true`. The default (`0` → auto) is empirically enough to escape bad basins on 10–20 parameter PK models without dominating the wall time of the subsequent local refine.
`bloq_method`	`drop`, `m3`	`drop`	How to handle rows with `CENS=1` (below LLOQ) or `CENS=-1` (above ULOQ). `m3` enables Beal’s M3 likelihood (see BLOQ example).
`npde_nsim`	integer ≥ 0	`0`	Number of Monte-Carlo replicates per subject used to compute the simulation-based NPDE/NPD diagnostics after the fit. `0` (default) disables the computation — no `NPDE`/`NPD` columns are emitted. A typical value is `1000`. Cost scales linearly with the count.
`npde_seed`	integer	—	RNG seed for the NPDE/NPD simulation, for reproducible diagnostics. Unset falls back to a fixed default. Only used when `npde_nsim > 0`.
`mu_referencing`	`true`, `false`	`true`	Re-centre inner-loop ETA estimates on the current population mean (auto-detected from `[individual_parameters]`). See the FAQ entry for details. Set `false` to reproduce pre-automatic-mu behaviour.
`iov_column`	string	—	Name of the occasion column in the dataset (e.g. `OCC`). Required when the model uses `kappa` or `block_kappa` declarations. The column must contain integer occasion indices. Case-insensitive. Only supported with `foce` / `focei` — not `saem`. See IOV documentation.
`optimizer_trace`	`true`, `false`	`false`	Write a per-iteration CSV to `/tmp/ferx_trace_<pid>_<ts>.csv`. The path is stored in `FitResult::trace_path`. Useful for diagnosing convergence problems or comparing optimizers. See Optimizer Trace.
`reconverge_gradient_interval`	integer ≥ 0	`0`	How often to re-solve each subject’s inner EBE loop during the population gradient instead of holding the EBEs (η̂) and FOCE Hessian fixed. `0` (default) never reconverges — the cheap fixed-EBE gradient is used. `1` reconverges on every gradient evaluation; `N` reconverges on evaluations `0, N, 2N, …` and uses the cheap gradient in between (amortizing the cost by ~`N`× while still periodically correcting the search direction). The fixed-EBE gradient omits the inner solution’s response to θ/Ω, so on ill-conditioned fits a gradient optimizer (e.g. `slsqp`) can stall well above the derivative-free (`bobyqa`) optimum; reconverging recovers the full surface at roughly 5–6× the per-gradient cost at interval `1`. IOV models (`kappa`/`block_kappa`) always reconverge and ignore this setting. See the FOCE/FOCEI page.
`inits_from_nca`	`true`, `false`, `nca`, `nca_sweep`, `nca_ebe`	`false`	Derive NCA-based starting values from the data before the optimizer loop. `true` (alias for `nca_sweep`) and `false`/`off` toggle the default strategy; the three named values pick a strategy explicitly (see NCA-based starting values). Fixed thetas are never overwritten; covariate-effect thetas (no mu-referencing link) keep the model default. Most useful with `method = trust_region` or `method = gn` / `method = gn_hybrid`, where bad starting values can cause early stalling. The same estimation is available without running a fit via the CLI flag `--inits-from-nca[=METHOD]` and (in ferx-r) the `ferx_inits_from_nca()` function.

NCA-based starting values

inits_from_nca estimates starting values directly from the data using non-compartmental analysis (NCA), then optionally refines parameters NCA cannot estimate. All three strategies are NCA-based; they differ only in how much refinement runs on top:

Value	Strategy	What it does	Typical cost
`nca`	NCA only	Per-subject NCA (AUC, terminal slope, Wagner–Nelson Ka, biexponential peeling for 2/3-cpt) pooled to population geometric means. Leaves parameters NCA can’t reach (peripheral Q/V2, all ODE/PD thetas) at the model default.	< 5 ms
`nca_sweep`	NCA + sweep	Runs `nca`, then sweeps every remaining non-fixed theta over a log-space rRMSE grid using population predictions (etas = 0). Model-agnostic — also covers ODE/PD models. This is what `true` selects.	< 500 ms (analytical)
`nca_ebe`	NCA + EBE sweep	Like `nca_sweep` but evaluates the grid with empirical Bayes estimates (etas ≠ 0); more accurate under large IIV (omega > ~0.2). Falls back to `nca_sweep` for ODE models.	< 500 ms (analytical)

The CL eta’s omega is also updated from inter-subject CV² when ≥ 3 subjects have a valid NCA estimate.

When nca_sweep is enabled but the fit fails to converge or the OFV looks suspiciously high, try nca_ebe.

Simulation-based diagnostics (NPDE / NPD)

Setting npde_nsim > 0 adds two simulation-based goodness-of-fit columns to the sdtab output: NPD (Normalized Prediction Discrepancies) and NPDE (Normalized Prediction Distribution Errors). Unlike CWRES — which linearises the model around the conditional mode and so inherits the bias of a first-order approximation — NPDE/NPD are built entirely from Monte-Carlo simulation under the fitted model, so they are robust to model nonlinearity and to non-Gaussian random effects (Brendel et al. 2006; Comets et al. 2008, the npde R package).

[fit_options]
method = focei
npde_nsim = 1000     # replicates per subject (off when 0, the default)
npde_seed = 12345    # optional, for reproducibility

The effective seed actually used — the explicit npde_seed when given, otherwise the built-in default — is recorded as model: npde_seed: in the {model}-fit.yaml output (and round-trips through .fitrx), so the NPDE/NPD columns can always be regenerated from the saved fit, even when no seed was set in the model file. The line is omitted when NPDE did not run (npde_nsim = 0).

For each observation y_ij, ferx simulates K = npde_nsim replicates under the fitted θ/Ω/Σ (sampling η ~ N(0, Ω) and the residual error), evaluated at the subject’s own observation design:

NPD (no decorrelation): pd_ij is the empirical CDF of the simulated values at observation ij, and npd_ij = Φ⁻¹(pd_ij).
NPDE (decorrelated): within each subject the observed and simulated vectors are decorrelated with the empirical mean and Cholesky factor of the simulated covariance (the Brendel/Comets procedure) before the empirical CDF and inverse- normal transform. This removes the within-subject correlation that NPD ignores.

Empirical-CDF probabilities at 0 or 1 are clamped to [1/(2K), 1 − 1/(2K)] before Φ⁻¹, matching the npde-package convention, so the scores stay finite.

Under a correctly specified model, NPDE follows a standard normal distribution. On the warfarin example (examples/warfarin.ferx, data/warfarin.csv), a converged FOCEI fit with npde_nsim = 1000 gives the expected mean-zero, unit-variance distribution:

Diagnostic	mean	variance	reference
NPDE	0.006	1.02	≈ N(0, 1)
NPD	−0.002	0.88	mean ≈ 0; variance < 1 because NPD retains within-subject correlation

This matches the npde R package’s autonpde() (same Cholesky decorrelation and edge-clamping) to Monte-Carlo noise. See tests/npde_validation.rs for the engine-side check and an R reproduction recipe.

NONMEM cross-validation

NONMEM has no native NPDE output; the reference pipeline is a NONMEM $SIMULATION post-processed by the npde R package. The same warfarin data was fit with NONMEM 7.5.1 (ADVAN2 TRANS2, METHOD=1 INTER, proportional error) and the two fits agree, so any difference in the diagnostics is attributable to the NPDE computation rather than the fit:

Quantity	ferx (FOCEI)	NONMEM 7.5.1
TVCL	0.1328	0.1327
TVV	7.738	7.738
TVKA	0.817	0.811
OFV	−286.0015	−286.0042

Feeding a 1000-replicate NONMEM $SIMULATION (under the NONMEM estimates) and the observed data to npde::autonpde() (Cholesky decorrelation, edge-clamping) reproduces ferx’s NPDE/NPD on the same 110 observations:

Diagnostic	ferx mean	ferx var	npde-pkg mean	npde-pkg var
NPDE	0.005	1.09	0.036	1.04
NPD	−0.009	0.91	−0.004	0.91

The NPD variance (the quantity that should sit below 1 because NPD retains the within-subject correlation) matches to three figures (0.91 vs 0.91); the small NPDE-variance gap is Monte-Carlo noise between two independent simulation draws. Both engines give NPDE ≈ N(0, 1) and a mean-zero NPD, as expected under this well-specified model.

Limitations. M3/BLQ censored observations need the predictive-CDF variant and are out of scope: censored rows (CENS != 0) are emitted as empty/NaN (matching the CWRES/IWRES convention), and a subject’s NPDE is NaN as a whole when it has any censored row, since the within-subject decorrelation would otherwise mix the LLOQ into the uncensored rows. NPDE also requires more replicates than a subject has observations (npde_nsim > n_obs) for a full-rank simulated covariance; otherwise that subject’s NPDE is NaN (NPD is still computed). For IOV (kappa) models the simulation holds kappas at zero (the existing simulate() convention), so NPDE/NPD omit the inter-occasion component of the predictive variance.

Estimation Methods

FOCEI (default)

method = focei

FOCE with Interaction. Includes the dependence of the residual error on random effects. More accurate than FOCE when the error model depends on individual predictions, but slightly slower.

FOCE

method = foce

First-Order Conditional Estimation. Linearizes the model around the empirical Bayes estimates. Fast and reliable for most models.

SAEM

method = saem

Stochastic Approximation EM. Uses Metropolis-Hastings sampling instead of MAP optimization for random effects. More robust to local minima; recommended for complex models with many random effects.

SAEM-Specific Options

Key	Default	Description
`n_exploration`	`150`	Phase 1 iterations (step size = 1)
`n_convergence`	`250`	Phase 2 iterations (step size = 1/k)
`n_mh_steps`	`20`	Block Metropolis-Hastings steps per subject per iteration. Also sizes the componentwise decorrelating kernel that prevents block-Ω collapse (`max(2, n_mh_steps / n_eta)` sweeps; multi-η models only — skipped when `n_eta < 2`). When `n_leapfrog > 0`, this applies to subjects that fall back to MH (see below); HMC subjects use one proposal per iteration regardless.
`n_leapfrog`	`0`	Leapfrog steps per HMC proposal (0 = use MH; see below). When > 0, subjects for which HMC is unavailable (ODE model, missing analytical PK path, non-finite Ω, unsupported TV-cov path) fall back to MH using `n_mh_steps` proposals.
`adapt_interval`	`50`	Iterations between step-size adaptation
`omega_burnin`	`20`	Initial exploration iterations during which Ω (and Ω_IOV) are held at their starting values while the MH chain warms up. Clamped to `n_exploration`; set `0` to disable. Prevents the Ω collapse described in the SAEM page.
`seed`	`12345`	RNG seed for reproducibility
`conddist`	`false`	Run a post-fit conditional-distribution pass estimating each subject’s `p(η_i \\| y_i)` by MCMC. Reports per-subject conditional mean/SD and distribution-based η-shrinkage (the shrinkage-unbiased basis for η diagnostics). See the SAEM page.
`conddist_nsamp`	`200`	Retained MCMC draws per subject in the conditional-distribution pass. Larger tightens the mean/SD estimates at linear cost. Production diagnostics want values in the thousands. Only used when `conddist = true`.
`conddist_burnin`	`20`	Burn-in draws discarded before accumulation, to forget the EBE-mode warm start. Only used when `conddist = true`.
`conddist_keep_samples`	`false`	Retain the raw per-subject draws (written to `{model}-conddist-samples.csv`), not just the mean/SD. Only used when `conddist = true`.

SIR (Sampling Importance Resampling)

SIR provides non-parametric parameter uncertainty estimates as an optional post-estimation step. Requires covariance = true.

Key	Default	Description
`sir`	`false`	Enable SIR uncertainty estimation
`sir_samples`	`1000`	Number of proposal samples (M)
`sir_resamples`	`250`	Number of resampled vectors (m)
`sir_seed`	`12345`	RNG seed for reproducibility
`sir_keep_samples`	`false`	Retain resampled parameter vectors for `simulate_with_uncertainty()`
`sir_df`	`5.0`	Degrees of freedom for the Student-t proposal; higher values approach a normal proposal

See SIR documentation for details.

Importance Sampling (IMP)

By default imp is a Monte-Carlo EM estimator (NONMEM METHOD=IMP): it maximises the importance-sampled marginal likelihood, updating θ/Ω/σ each iteration. Set imp_eval_only = true (NONMEM EONLY=1) to instead evaluate the marginal −2 log L at the fixed input parameters — a lower-bias −2 log L than the FOCE/Laplace OFV when subject posteriors of η are non-Gaussian (e.g. sparsely-sampled PK).

Behaviour change: imp previously only evaluated −2 log L. It now estimates by default. Add imp_eval_only = true to recover the old behaviour.

[fit_options]
  method        = imp            # estimate (NONMEM METHOD=IMP)
  imp_iterations = 200
  imp_samples    = 1000
  imp_averaging  = 50
  imp_proposal_df = 5             # or `normal` for a multivariate-normal proposal
  imp_seed       = 12345

[fit_options]
  method        = [focei, imp]   # evaluate FOCEI's fit (NONMEM EONLY=1)
  imp_eval_only  = true

On rich data prefer impmap or warm-start with [focei, imp]; plain imp’s one-iteration-lagged proposal can collapse the ESS on a sharp posterior. See the IMP documentation.

Key	Default	Description
`imp_eval_only`	`false`	`true` ⇒ evaluate `−2 log L` at fixed parameters (NONMEM `EONLY=1`); must be the terminal chain stage. `false` ⇒ estimate (NONMEM `METHOD=IMP`).
`imp_iterations`	`200`	MCEM iterations (estimator only).
`imp_averaging`	`50`	Terminal iterations averaged into the reported estimate (estimator only).
`imp_samples`	`1000`	Importance samples K per subject. 2000–5000 recommended for publication-quality MC SE.
`imp_proposal_df`	`5.0`	Student-t proposal degrees of freedom (≥ 1), or `normal`/`mvn` for a multivariate-normal proposal. Lower = heavier tails.
`imp_auto`	`true`	Adaptive sample count (NONMEM `AUTO`). When `true`, `imp_samples` is the starting count and is ramped up (×2/iteration, cap 10000) while the objective’s Monte-Carlo SE exceeds 1.0. Recommended for high-dimensional / FREM models, where a fixed count biases the M-step.
`imp_seed`	`12345`	RNG seed. Same seed → identical result.
`imp_low_ess_threshold`	`0.1`	Subjects with normalized ESS below this fraction get flagged in the result. Set `0` to silence.
`imp_defensive_alpha`	`0.0`	Defensive-mixture weight (issue #528), opt-in. Each subject draws this fraction of its samples from the prior `N(0, Ω)` rather than the mode-centred proposal, and every sample is scored under the mixture density. This bounds the importance weights so a weakly-identified subject — e.g. an analytical `[initial_conditions]` baseline whose `V` cancels in the amplitude — can’t hijack the weighted M-step and walk θ to the bounds. Must be in `[0, 1)`; the default `0` is the legacy single-proposal sampler (bit-comparable with NONMEM), set a small positive value (e.g. `0.1`) to enable the rescue. Applies to `imp` and `impmap` (including the FREM Rao-Blackwell path); for an `impmap` stage you may also write `impmap_defensive_alpha`. Enabling it disables Sobol QMC and raises the per-subject ESS floor (so `imp_low_ess_threshold` flags fewer subjects). See Importance Sampling → Defensive mixture.

See Importance Sampling documentation for the algorithm, the NONMEM mapping, IOV caveats, and tuning guidance.

IMPMAP (`importance_sampling_map`)

Like the estimating imp, impmap is a Monte-Carlo EM estimator — equivalent to NONMEM METHOD=IMPMAP. The difference is the proposal: impmap re-centers at the freshly-computed conditional mode (MAP) every iteration (robust on rich data), whereas imp re-centers from the previous iteration’s sample moments (cheaper, but fragile on rich data). Both update θ/Ω/σ from the importance-weighted posterior moments. It runs standalone or as a chain stage:

[fit_options]
  method             = importance_sampling_map   # alias: impmap
  impmap_iterations  = 200
  impmap_samples     = 300
  impmap_proposal_df = 4          # Student-t (default); `normal` for MVN
  impmap_seed        = 12345

Key	Default	Description
`impmap_iterations`	`200`	Number of MCEM iterations (parameter updates).
`impmap_samples`	`300`	Importance samples K per subject per iteration. Larger K reduces Monte-Carlo noise at linear cost.
`impmap_proposal_df`	`4`	Proposal degrees of freedom. A finite value ≥ 1 gives a heavier-tailed Student-t (default `4`); `normal` (or `mvn`) gives a multivariate-normal proposal (NONMEM’s default). The Gaussian’s lighter tails under-cover the posterior of weakly-identified parameters and bias the M-step moments, so ferx defaults to a Student-t.
`impmap_auto`	`true`	Adaptive sample count (NONMEM `AUTO`). When `true`, `impmap_samples` is the starting count and is ramped up (×2/iteration, cap 10000) while the objective’s Monte-Carlo SE exceeds 1.0 (NONMEM `STDOBJ`). Strongly recommended for FREM / high-dimensional models — a fixed count leaves a sample-count-dependent bias in the typical-value and Ω estimates.
`impmap_averaging`	`50`	Final iterations whose parameters are averaged into the reported estimate (Monte-Carlo variance reduction).
`impmap_seed`	`12345`	RNG seed. Same seed → identical estimates.
`impmap_low_ess_threshold`	`0.1`	Subjects with normalized ESS below this fraction are flagged as poorly sampled.
`impmap_trace`	`false`	When `true`, collect per-iteration parameter values into `FitResult.impmap_trace` — analogous to NONMEM `.ext` output for traceplots.
`impmap_mceta`	`0`	Number of additional random starting points for per-subject MAP optimization (analogous to NONMEM `MCETA`). Each start draws η from N(0, Ω). The start with the lowest individual NLL wins. `0` = single warm-start (default). `3` is a good choice for high-dimensional models (e.g. FREM with ≥ 5 ETAs).
`impmap_sobol`	`false`	Use Sobol quasi-random sequences (with Cranley-Patterson randomization) for IS draws instead of pseudo-random. Gives more uniform posterior coverage with fewer samples. Only applies to MVN proposals (`impmap_proposal_df = normal`); Student-t falls back to pseudo-random.
`iscale_min`	`0.1`	Minimum proposal scaling factor for adaptive IS (NONMEM `ISCALE_MIN`). The IS proposal covariance is multiplied by `s²` where `s` is chosen from `[iscale_min, iscale_max]` to maximise per-subject ESS. Set both to `1.0` to disable.
`iscale_max`	`10.0`	Maximum proposal scaling factor (NONMEM `ISCALE_MAX`).
`frem_rao_blackwell`	`true`	FREM only: Rao-Blackwellise the covariate ETAs (integrate them analytically, importance-sample only the PK ETAs). Strongly recommended — brute-force sampling of the near-singular covariate dimensions has very poor ESS. Set `false` only to diagnose the RB path against the full-dimensional sampler.

impmap reuses inner_maxiter / inner_tol for the per-iteration MAP step. Inter-occasion variability ([iov] / kappa) and SDE ([diffusion]) models are not yet supported and are refused up front. See the IMPMAP documentation for the algorithm and the NONMEM comparison.

Optimizer Choices

Optimizer	Description	Recommended For
`auto`	Picks `nlopt_lbfgs` (analytic gradient available) or `bobyqa` (FD only) per model	Default; the recommended choice for most fits
`bobyqa`	NLopt BOBYQA — derivative-free quadratic interpolation	Robust on noisy / non-smooth FOCE surfaces (ODE/PD, sparse data, Hill-ridge models); what `auto` selects there
`slsqp`	Sequential Least Squares Programming (NLopt)	Smooth, well-conditioned analytical PK models where you want gradient-based convergence; pair with `reconverge_gradient_interval` if it stalls
`nlopt_lbfgs`	NLopt L-BFGS	Analytic-gradient FOCE/FOCEI models (fastest-to-optimum in #490 benchmarking); large parameter spaces; what `auto` selects there
`lbfgs`	Deprecated alias → `nlopt_lbfgs`	— (was the built-in L-BFGS; see #483)
`bfgs`	Deprecated alias → `nlopt_lbfgs`	— (was the built-in BFGS)
`mma`	Method of Moving Asymptotes (NLopt)	Constrained problems
`trust_region`	Newton trust-region with Steihaug CG subproblem (argmin)	Well-conditioned problems where second-order curvature helps convergence

Notes: - auto (the default) chooses between nlopt_lbfgs and bobyqa from the model: when the exact analytic FOCE/FOCEI gradient is available (analytical PK model, in the sensitivity provider’s scope, and gradient not forced to fd) it uses the gradient-based nlopt_lbfgs; otherwise — ODE/PD models, LTBS/SDE, or gradient = fd, where the outer loop runs on finite differences — it falls back to the derivative-free bobyqa. Benchmarking across ~10 real FOCEI datasets (#490) found nlopt_lbfgs fastest-to-optimum on every analytic-gradient problem and bobyqa both fastest and most reliable on the FD-only problems. The fit output records the resolved optimizer as auto (<resolved>). Set optimizer explicitly to override the choice. - bobyqa does not use gradients, so it is robust to small discontinuities in the FOCE surface caused by EBE re-estimation, but it converges more slowly than gradient-based methods on smooth problems. - trust_region uses the analytic outer gradient (same subject_nll_pop_grad as the outer FOCE optimizer) and a BHHH approximate Hessian (H ≈ 4 Σ gᵢgᵢᵀ). The BHHH matrix is always positive semi-definite, so the Steihaug subproblem is well-conditioned even near constraints. The Steihaug CG budget defaults to ceil(sqrt(n_params)) — typically 5 for standard NLME models, which is far cheaper than the previous FD-Hessian path (O(n²) OFV evaluations per Hessian).

Parameter Scaling and EBE Convergence

Key	Default	Description
`parameter_scaling`	`auto`, `none`, `abs`, `rescale2`	`auto`
`scale_params`	`false`	Legacy boolean alias for `parameter_scaling = abs`. Divide each packed (log/Cholesky) coordinate by its initial magnitude before passing it to the optimizer. Off by default since issue #99: the scaling-enabled path only ever runs on log/Cholesky coordinates, where dividing by `\\|log value\\|` is counterproductive — e.g. `ln(V)=ln(20)≈3` gets scale 3, turning the optimizer’s unit step into a ≈20× multiplicative jump in V, which overshoots and (via the uniform SLSQP gradient cap) starves the step in other dimensions such as OMEGA, halting short of the minimum. Prefer `parameter_scaling = rescale2` for gradient-based optimizers. The OFV value is unchanged at any fixed point, but the optimizer trajectory and stop point are not.
`max_unconverged_frac`	`0.1`	Fraction of subjects (with at least `min_obs_for_convergence_check` observations) allowed to have unconverged EBEs before the outer optimizer rejects the step (returns OFV = ∞). Set to `1.0` to disable the guard.
`min_obs_for_convergence_check`	`2`	Subjects with fewer than this many observations are excluded from the `max_unconverged_frac` check (they still run normally).
`stagnation_guard`	`true`	Short-circuit the NLopt-based outer optimizers once recent evals show no OFV improvement above 1e-3 over a window of `3*(n+1).max(50)` evals. This lets SLSQP / L-BFGS terminate quickly via their own xtol/ftol on numerically-flat plateaus (e.g. γ-bearing FOCEI problems) instead of grinding through the remaining `outer_maxiter` budget at full inner-loop cost. Set to `false` to let the optimizer run to its natural termination criterion — useful when debugging or for problems with very slow but real OFV improvements below the threshold.
`ebe_warm_start`	`false`	When an inner per-subject EBE solve fails its BFGS step and falls back to Nelder–Mead, warm-start the simplex from the BFGS partial η̂ rather than cold-starting from the prior mode η=0. A weakly-identified η (e.g. an unidentifiable peripheral volume) drives BFGS far out onto the steep prior slope, and NM slides down it to the mode in far fewer iterations than refining from 0 in the flat basin — substantially fewer prediction walks on fallback-heavy fits (≈1.7× on a 2-cpt unidentifiable-V2 benchmark). Opt-in: warm-starting moves the fallback subjects’ EBEs, which perturbs the outer optimiser’s trajectory — harmless for the derivative-free BOBYQA default, but it can derail a gradient-based outer optimiser (e.g. `mma`) into a worse basin on some models. Enable only after validating OFV/estimates on your model + `optimizer`; leave `false` for the historical cold-restart behaviour.

Options That Don’t Apply to the Selected Method

If you set an option that the chosen estimation method doesn’t consume (e.g. n_convergence with method = focei, or optimizer with method = saem), fit() emits a warning listing the option, the selected method, and the keys that are available for that method. The option is ignored — the fit still proceeds.

For a chained fit (method = [saem, focei]), an option is kept if it applies to any stage in the chain, so SAEM and FOCEI keys can be mixed without warnings.

Multi-Start Optimization

Key	Default	Description
`n_starts`	`1`	Number of independent optimization runs. `1` disables multi-start (no behaviour change). When `> 1`, all starts run in parallel via rayon; the converged run with the lowest OFV is returned. Start 0 always uses the exact initial values from the model file.
`start_sigma`	`0.3`	Log-space perturbation applied to initial theta values for starts 1..n. Log-packed thetas are multiplied by `exp(N(0, start_sigma))`; thetas with negative lower bounds are shifted additively.
`multi_start_seed`	`42`	RNG seed for the multi-start theta perturbations. Independent of `seed` (SAEM) so that changing the SAEM seed does not silently alter which perturbed starting points are used in FOCE multi-start runs.

Multi-start is most useful for models prone to local minima: nonlinear elimination (Michaelis-Menten), full-block omega, or many covariate parameters. On an 8-core machine n_starts = 8 costs the same wall-clock time as a single run but has ~8× lower probability of a local minimum.

Global Search

Setting global_search = true runs a gradient-free pre-search (NLopt CRS2-LM) before the local optimizer. This helps escape local minima on challenging datasets.

The number of global evaluations is auto-scaled based on the number of parameters and observations, or can be set explicitly with global_maxeval.

Examples

Standard FOCEI with defaults:

[fit_options]
  method     = focei
  maxiter    = 300
  covariance = true

FOCEI with global search:

[fit_options]
  method        = focei
  maxiter       = 500
  covariance    = true
  global_search = true

SAEM with custom settings:

[fit_options]
  method        = saem
  n_exploration = 200
  n_convergence = 300
  n_mh_steps    = 5
  seed          = 42
  covariance    = true

FOCEI with SIR uncertainty:

[fit_options]
  method        = focei
  covariance    = true
  sir           = true
  sir_samples   = 1000
  sir_resamples = 250
  sir_seed      = 42

Derivative-free BOBYQA fit:

[fit_options]
  method        = foce
  optimizer     = bobyqa
  maxiter       = 300
  inner_maxiter = 100
  inner_tol     = 1e-6

Trust-region with tuned CG subproblem:

[fit_options]
  method             = foce
  optimizer          = trust_region
  maxiter            = 200
  steihaug_max_iters = 30

FOCE with Inter-Occasion Variability:

[fit_options]
  method     = foce
  iov_column = OCC
  covariance = true

Enable optimizer trace and EBE guard:

[fit_options]
  method                        = foce
  optimizer_trace               = true
  max_unconverged_frac          = 0.5
  min_obs_for_convergence_check = 3

Optimizer Trace

When optimizer_trace = true, a CSV is written to /tmp/ferx_trace_<pid>_<ts>.csv and the path is stored in FitResult::trace_path. Each row is one outer iteration.

Column	Populated by	Description
`iter`	all	Iteration number
`method`	all	`foce`, `focei`, `gn`, `gn_hybrid`, `saem`
`phase`	gn_hybrid, saem	`focei` (polish) or `explore`/`converge`
`ofv`	all	Objective function value
`wall_ms`	all	Wall time for this iteration (ms)
`grad_norm`	BFGS, NLopt gradient-mode	Gradient norm
`step_norm`	BFGS	Step size
`inner_iter_count`	(reserved)	Reserved for future per-iteration inner-loop count; currently `NA`
`optimizer`	FOCE/FOCEI	Active NLopt algorithm
`lm_lambda`	GN	Levenberg-Marquardt damping factor
`ofv_delta`	GN	Change in OFV from previous iteration
`step_accepted`	GN	Whether the GN step was accepted
`cond_nll`	SAEM	Conditional observation NLL
`gamma`	SAEM	SAEM step-size (1 in exploration, 1/k in convergence)
`mh_accept_rate`	SAEM	Mean acceptance rate across all subjects (MH or HMC). In mixed HMC/MH runs (`n_leapfrog > 0` with some MH-fallback subjects) this is an aggregate across both samplers.
`n_ebe_unconverged`	FOCE/FOCEI	Subjects whose inner optimizer did not converge
`n_ebe_fallback`	FOCE/FOCEI	Subjects that fell back to Nelder-Mead

Unused columns contain NA. The trace is buffered and flushed when the fit ends; if a run is killed mid-iteration the buffered tail may be lost.