Fit options

The [fit_options] block in a .ferx file, and the arguments to ferx_fit(), control the estimation method, optimizer, convergence settings, and diagnostics. All keys have defaults — the block can be omitted entirely for a standard FOCEI fit.

Convergence troubleshooting

The most common reason to open this page is a fit that didn’t converge or gave surprising results. Work through this list in order:

1. Is the STATUS line red / does fit$converged return FALSE?

Check ferx_warnings(fit) — boundary parameters, high condition number, and unconverged subjects all appear there with guidance.
Run ferx_check_init() first: if ofv_drop is near zero, the initial values are the problem. Fix them or use inits_from_nca = "nca_sweep".
If OFV is dropping but hit the iteration cap, increase maxiter.
Plot the trace (optimizer_trace = TRUE then ferx_plot_trace(fit)) — is the OFV still declining at the last iteration, or has it stalled?

2. OFV converged but optimizer stalled above expected minimum

Switch from gradient-based (slsqp) to derivative-free (bobyqa) — the FOCE surface has small discontinuities from EBE re-estimation that trip gradient-based methods.
If using slsqp or lbfgs, try reconverge_gradient_interval = 1 to recover the full gradient surface (costs ~5–6× per outer iteration).
Try global_search = TRUE to escape a local basin.
Try n_starts = 8 for multi-start (≈ same wall time as one run on 8 cores).

3. Some subjects always fail their EBE inner loop

Raise inner_maxiter (e.g. 200L → 500L) and tighten inner_tol (e.g. 1e-4 → 1e-6).
Allow a fraction of failures: max_unconverged_frac = 0.1 (default). Raise to 0.2 for large sparse datasets.
Exclude very sparse subjects from the count: min_obs_for_convergence_check = 3.

4. IOV model stalls

Use a SAEM → FOCEI chain: method = [saem, focei]. The IOV variance surface is weakly identified and SAEM finds it more reliably than a cold FOCEI start.
Do not use slsqp with IOV without reconverge_gradient_interval = 1 — the fixed-EBE gradient cannot see the IOV variance direction.

5. SAEM doesn’t look converged

Inspect the trace (optimizer_trace = TRUE): the population parameters should settle to stable averages by the end of phase 2.
Increase n_convergence (default 250 → 400+) if the averaging phase is too short.
Increase omega_burnin (default 20) if Ω collapses in early iterations on sparse data.
SAEM with HMC (n_leapfrog = 3) mixes faster for correlated ETAs.

Which method should I use?

Situation	Recommended
Starting out / standard PK	`focei` (default)
Fast feedback during model building	`gn` or `gn_hybrid`
Complex model, many ETAs, convergence issues	`c("saem", "focei")` chain
Need the most accurate marginal log-likelihood	`c("focei", "imp")`
Compare models on marginal likelihood	`imp` as final chain stage

The method chaining guide covers these in depth with worked R code.

Estimation methods

`focei` — FOCE with Interaction (default)

Linearises around the EBE η̂; evaluates residual variance at η̂ (not η = 0). Correct for proportional and combined error models. The right choice for most NLME fits.

[fit_options]
  method     = focei
  covariance = true

Works with: all outer optimizers · BLOQ M3 · IOV · SAEM/GN chains · SIR

`foce` — FOCE (no interaction)

Evaluates residual variance at η = 0. Equivalent to FOCEI for additive error; slightly less accurate for proportional or combined. Use when reproducing a NONMEM FOCE result.

Works with: same as FOCEI.

`gn` — Gauss-Newton

BHHH Hessian approximation with Levenberg-Marquardt damping. Converges in 10–30 iterations on well-conditioned problems — the fastest method for feedback during model building. Does not accept an outer optimizer key.

[fit_options]
  method     = gn
  covariance = false   # skip covariance step for fast iterations

GN-specific option:

Key	Default	Description
`gn_lambda`	`0.01`	Levenberg-Marquardt damping. Increase (e.g. `0.1`) when GN steps are too aggressive on a rough surface.

Works with: IOV · BLOQ M3 · multi-start · NCA inits

Does not accept: optimizer, global_search, stagnation_guard, reconverge_gradient_interval

`gn_hybrid` — Gauss-Newton then FOCEI polish

GN finds the basin; FOCEI refines to the MLE. Useful when GN is fast but its approximate Hessian gives slightly suboptimal final parameter estimates.

[fit_options]
  method     = gn_hybrid
  covariance = true

Works with: all outer optimizers (applied to the FOCEI polish stage) · everything GN accepts.

`saem` — Stochastic Approximation EM

Replaces MAP optimization with Metropolis-Hastings sampling for the E-step. More robust to multi-modal ETA posteriors and high-shrinkage scenarios. Run time is controlled by iteration counts, not maxiter.

[fit_options]
  method        = saem
  n_exploration = 150
  n_convergence = 250
  covariance    = true

SAEM-specific options:

Key	Default	Description
`n_exploration`	`150`	Phase 1 iterations — stochastic, step size γ = 1. Sampler explores the posterior.
`n_convergence`	`250`	Phase 2 iterations — step size anneals to 0. Running average converges.
`n_mh_steps`	`10`	Metropolis-Hastings proposals per subject per iteration. More = better mixing, slower per iteration.
`adapt_interval`	`50`	How often (iterations) to adapt the MH proposal scale.
`omega_burnin`	`20`	Hold Ω fixed for this many iterations while the sampler warms up. Guards against Ω collapse on sparse data. Set `0` to disable.
`seed` / `saem_seed`	`12345`	RNG seed for the MH sampler. Independent of `multi_start_seed`.
`n_leapfrog` / `saem_n_leapfrog`	`0`	Leapfrog steps for HMC proposals. `0` = Metropolis-Hastings. A positive value replaces MH with HMC — better mixing for correlated ETAs. The HMC η-sampler uses the analytic `Dual2` sensitivity provider.

Works with: IOV · BLOQ M3 · method chains

Does not accept: optimizer, maxiter, global_search, reconverge_gradient_interval

`imp` — Importance Sampling (chain-terminal stage)

Computes the marginal log-likelihood by Monte Carlo importance sampling around the FOCEI/SAEM solution. Does not update parameters — purely diagnostic. Must be the last stage of a chain.

[fit_options]
  method = [focei, imp]

or in R: ferx_fit(model, data, method = c("focei", "imp")).

The marginal -2LL is more accurate than the FOCEI Laplace approximation and is comparable across non-nested models in the same family.

IMP-specific options (pass via settings = list(...) in R):

Key	Default	Description
`is_samples`	`1000`	Monte Carlo draws per subject. Halving the MC SE requires 4× more samples.
`is_proposal_df`	`5`	Student-t proposal degrees of freedom. Smaller = heavier tails = more robust to misspecified covariance, but less efficient.
`is_seed`	(engine)	RNG seed for IS step.
`is_low_ess_threshold`	`0.1`	ESS fraction below which a subject is flagged in `fit$importance_sampling$low_ess_subject_ids`.

Works with: any method as the preceding stage · IOV (kappa is fixed at EBE mode; a note appears in print(fit))

Method chains

Pass a list to run stages in sequence. Each stage is seeded from the previous converged parameters; only the final stage produces covariance/diagnostics.

[fit_options]
  method = [saem, focei]

In R: ferx_fit(model, data, method = c("saem", "focei")).

Common chains and their purpose:

Chain	Why
`[saem, focei]`	SAEM finds the basin robustly; FOCEI polishes to the MLE
`[gn, focei]`	Fast GN warm-up then accurate FOCEI
`[focei, imp]`	FOCEI fit + exact marginal log-likelihood
`[saem, imp]`	Most robust: SAEM basin + exact marginal LL
`[saem, focei, imp]`	Full pipeline: robust start → MLE → exact marginal LL

Options that apply to any stage in the chain are accepted without a warning (e.g. both n_exploration and maxiter are valid in a [saem, focei] chain).

Outer optimizer

Applies to FOCE / FOCEI / GN-hybrid (not pure GN, not SAEM).

[fit_options]
  optimizer = bobyqa    # default

Optimizer	Algorithm	When to use
`bobyqa`	NLopt BOBYQA — derivative-free (default)	General use. Robust to small surface discontinuities.
`slsqp`	NLopt SLSQP — gradient-based	Smooth, well-conditioned problems. Faster per iteration. Pair with `reconverge_gradient_interval = 1` on IOV or difficult models.
`trust_region`	Newton trust-region with Steihaug CG	Ill-conditioned problems; guaranteed descent each step.
`lbfgs` / `nlopt_lbfgs`	L-BFGS	Moderate-sized parameter spaces.
`mma`	NLopt Method of Moving Asymptotes	When SLSQP stalls.
`bfgs`	Hand-rolled BFGS	Small parameter spaces.

Trust-region tuning:

Key	Default	Description
`steihaug_max_iters`	`ceil(sqrt(n_params))` ≥ 5	CG subproblem budget. Increase for ill-conditioned models. Only consumed with `optimizer = trust_region`.

Gradient-based optimizer tuning:

Key	Default	Description
`stagnation_guard`	`true`	Terminate NLopt early when OFV plateau is numerically flat. Set `false` to let SLSQP/L-BFGS run to their own xtol/ftol or `maxiter`.
`reconverge_gradient_interval`	`0`	Re-solve EBEs during the population gradient every N evaluations. `0` = fixed-EBE gradient (cheap). `1` = reconverge every evaluation (~5–6× cost, recovers the full surface). Helps gradient-based optimizers past ill-conditioned plateaus. IOV models always reconverge regardless of this setting.

Global pre-search (before local optimizer):

Key	Default	Description
`global_search`	`false`	Run NLopt CRS2-LM global search before local refinement.
`global_maxeval`	`200 * (n_params + 1)`	OFV evaluation budget for the global stage.

Convergence and iteration control

Key	Methods	Default	Description
`maxiter`	FOCE/FOCEI/GN/GN-hybrid	`500`	Maximum outer iterations. Not used by SAEM (use `n_exploration`/`n_convergence`).
`inner_maxiter`	all	`200`	Per-subject EBE iteration cap.
`inner_tol`	all	`1e-4`	Gradient-norm convergence tolerance for the inner EBE loop.
`max_unconverged_frac`	all	`0.1`	Fraction of subjects (with ≥ `min_obs_for_convergence_check` obs) allowed unconverged EBEs before the step is rejected. Raise for sparse populations.
`min_obs_for_convergence_check`	all	`2`	Subjects with fewer observations are excluded from the convergence-guard count.

Special data features

Key	Methods	Default	Description
`iov_column`	all	—	Name of the occasion column (e.g. `OCC`). Required for IOV / kappa models.
`bloq_method`	all	`drop`	BLOQ handling: `drop` excludes CENS=1 rows; `m3` uses Beal’s M3 likelihood (requires `CENS` column, DV = LLOQ on censored rows).

Gradient method

Controls how the inner-loop (per-subject EBE) gradient is computed. Set in the model file with gradient = ... or in R with the gradient argument.

Value	Description
`auto` (default)	Exact analytic `Dual2` sensitivities when the model is in scope (analytical PK path); finite differences otherwise. Right choice for almost all uses.
`fd`	Force finite differences. Always available. Required for SDE (`[diffusion]`) models and Form-C per-CMT scaling.

The legacy ad value is no longer accepted and returns an E_AD_RETIRED error — use auto or fd instead.

Set FERX_TIME_GRADIENTS=1 to print per-call timings at the end of a fit.

Starting values and robustness

Key / argument	Methods	Default	Description
`inits_from_nca`	all	`false`	Derive starting values from NCA before the optimizer. `true` = `"nca_sweep"`. Options: `"nca"`, `"nca_sweep"`, `"nca_ebe"`. Most useful with `gn` or `trust_region`.
`n_starts`	FOCE/FOCEI/GN	`1`	Independent fits from perturbed initial values (multi-start). Best OFV wins.
`start_sigma`	FOCE/FOCEI/GN	`0.3`	Log-space perturbation for multi-start (≈ 30% CV).
`multi_start_seed`	FOCE/FOCEI/GN	`42`	RNG seed for start-point perturbations.
`scale_params`	all	`false`	Apply per-coordinate scaling to the outer optimizer. Leave `false` (default) — see note.
`mu_referencing`	all	`true`	Auto-detect mu-referencing from `[individual_parameters]`. Disable only for debugging.

Why scale_params defaults to false

Scaling divides each log/Cholesky-packed coordinate by its initial absolute value, so the optimizer works in a near-unit-magnitude space. This sounds helpful but is counterproductive for standard NLME: a coordinate like log(V) = log(20) ≈ 3 gets a scale factor of 3, so the optimizer’s unit step becomes a 3-unit move in log-space — a 20× multiplicative jump in V. That overshoots and, through the uniform gradient cap, starves the step in Ω dimensions. Leave at false unless you have a specific reason (e.g. A/B testing the scaled trajectory on an unusual model).

Uncertainty quantification

SIR (post-fit, requires `covariance = true`)

Key	Default	Description
`sir`	`false`	Enable Sampling Importance Resampling after the fit.
`sir_samples`	`1000`	Candidate draws (M). More → smoother CI, slower.
`sir_resamples`	`250`	Retained draws (m). Rule of thumb: m ≈ M/4.
`sir_seed`	(engine)	RNG seed, independent of `saem_seed`.

IMP (chain stage — see above)

IMP is the more accurate alternative when you need the marginal log-likelihood. SIR gives you non-parametric parameter uncertainty intervals. They serve different purposes and can be combined:

[fit_options]
  method        = [focei, imp]
  sir           = true
  sir_samples   = 2000
  sir_resamples = 500
  covariance    = true

Diagnostics and output

Key / argument	Default	Description
`optimizer_trace`	`false`	Write per-iteration CSV to a temp file. Read with `ferx_trace(fit)`; plot with `ferx_plot_trace(fit)`.
`threads`	(all CPUs)	Thread count for the Rust backend. Cap on shared machines or to avoid SMT contention: `parallel::detectCores(logical = FALSE)`.
`covariance`	`true`	Compute covariance matrix and SEs. Set `false` during exploratory iterations to save time.

Settings precedence

Three layers can set the same option. The order of precedence (highest first):

Dedicated ferx_fit() arguments (method, covariance, bloq_method, gradient, threads, sir, optimizer_trace, scale_params, inits_from_nca)
settings = list(...) — fine-grained overrides in R
[fit_options] in the .ferx file — model-file defaults

A warning() is emitted when a call-time value overrides the model file. Inspect fit$model_file_settings and fit$call_settings to audit.

Complete worked examples

Standard FOCEI (default):

[fit_options]
  method     = focei
  covariance = true

Fast exploratory GN:

[fit_options]
  method     = gn
  covariance = false

SAEM → FOCEI chain (complex model):

[fit_options]
  method        = [saem, focei]
  n_exploration = 200
  n_convergence = 300
  omega_burnin  = 30
  covariance    = true

FOCEI + exact marginal log-likelihood:

[fit_options]
  method   = [focei, imp]
  is_samples = 2000
  covariance = true

IOV model:

[fit_options]
  method     = focei
  iov_column = OCC
  covariance = true

IOV with SAEM (robust):

[fit_options]
  method        = [saem, focei]
  iov_column    = OCC
  n_exploration = 150
  n_convergence = 250
  omega_burnin  = 20
  covariance    = true

BLOQ M3:

[fit_options]
  method      = focei
  bloq_method = m3
  covariance  = true

Multi-start (8 parallel starts):

[fit_options]
  method           = focei
  n_starts         = 8
  start_sigma      = 0.3
  multi_start_seed = 42
  covariance       = true

Tight convergence + optimizer trace:

[fit_options]
  method          = focei
  optimizer       = slsqp
  maxiter         = 500
  inner_maxiter   = 100
  inner_tol       = 1e-6
  optimizer_trace = true
  covariance      = true

FOCEI + SIR uncertainty:

[fit_options]
  method        = focei
  covariance    = true
  sir           = true
  sir_samples   = 2000
  sir_resamples = 500

SAEM with HMC proposals:

[fit_options]
  method       = saem
  n_leapfrog   = 3
  omega_burnin = 30
  covariance   = false

Options by method — quick reference

Option	foce	focei	gn	gn_hybrid	saem	imp
`maxiter`	✓	✓	✓	✓	—	—
`optimizer`	✓	✓	—	✓	—	—
`global_search`	✓	✓	—	✓	—	—
`reconverge_gradient_interval`	✓	✓	—	✓	—	—
`stagnation_guard`	✓	✓	—	✓	—	—
`gn_lambda`	—	—	✓	✓	—	—
`n_exploration` / `n_convergence`	—	—	—	—	✓	—
`n_mh_steps` / `omega_burnin`	—	—	—	—	✓	—
`n_leapfrog`	—	—	—	—	✓	—
`is_samples` / `is_proposal_df`	—	—	—	—	—	✓
`inner_maxiter` / `inner_tol`	✓	✓	✓	✓	✓	—
`iov_column`	✓	✓	✓	✓	✓	—
`bloq_method`	✓	✓	✓	✓	✓	—
`n_starts` / `start_sigma`	✓	✓	✓	✓	—	—
`sir` / `sir_samples`	✓	✓	✓	✓	✓	—
`covariance`	✓	✓	✓	✓	✓	—

(Chains: an option is accepted if it applies to any stage.)