Fit a nonlinear mixed effects model

Description

Fits a NLME model using FOCE or FOCEI estimation with a Rust backend.

Usage

ferx_fit(
  model,
  data = NULL,
  method = NULL,
  covariance = NULL,
  verbose = NULL,
  bloq_method = NULL,
  threads = NULL,
  mu_referencing = NULL,
  sir = NULL,
  gradient = NULL,
  optimizer_trace = FALSE,
  scale_params = FALSE,
  inits_from_nca = FALSE,
  fd_hessian_step = NULL,
  settings = NULL,
  output = NULL,
  include_data = FALSE,
  ignore = NULL,
  accept = NULL,
  ignore_ids = NULL,
  ...
)

Arguments

model: Path to a .ferx model file, or a [ferx_model](ferx_model.qmd) object.
data: Path to a NONMEM-format CSV file. Required columns: ID, TIME, DV, EVID, AMT, CMT. Optional columns recognised by the engine: RATE (infusion rate; RATE = -1 infuses AMT at a modeled rate given by a per-subject parameter R{n}, and RATE = -2 over a modeled duration given by D{n}, on dose compartment n), MDV (missing-DV flag), II (dosing interval, required when SS > 0), SS (steady-state flag: 1 = pre-dose at steady state, 2 = add SS concentration to current state), CENS (LOQ censoring flag for M3 method: 1 below LLOQ, -1 above ULOQ), OCC (occasion index for IOV), and any covariate columns referenced in the model. See the steady-state section below for SS/II details.
method: Estimation method(s). NULL (the default) uses whatever the model file’s [fit_options] method specifies, falling back to "focei" if the model file sets none (the engine warns when it defaults). Pass a value here to override the model file: either a single string or a character vector of methods to run in sequence. Each stage is seeded with the previous stage’s converged parameters, and only the final stage produces the reported covariance/diagnostics. Supported methods: "foce", "focei", "saem", "gn" (Gauss-Newton / BHHH), "gn_hybrid" (Gauss-Newton followed by a FOCEI polish step), "imp" (also accepted as "importance_sampling" or "importance-sampling"; the NONMEM METHOD=IMP importance-sampling Monte-Carlo EM estimator), or "impmap" (also accepted as "importance_sampling_map"; Importance Sampling assisted by Mode A Posteriori, the NONMEM METHOD=IMPMAP Monte-Carlo EM estimator). "imp" is an estimator by default (it updates parameters) and may run standalone (method = "imp"), lead, or sit mid-chain. Set settings = list(imp_eval_only = TRUE) (NONMEM EONLY=1) to make it instead evaluate the marginal -2 log L at the fixed input parameters; in that mode it must be the last entry of the chain, e.g. c("focei", "imp"). Plain "imp" re-centers its proposal from the previous iteration’s samples and so is fragile on rich data; prefer "impmap" or warm-start with c("focei", "imp") there. "impmap" is a full estimator: it may run standalone (method = "impmap") or as a chain stage (c("focei", "impmap")), requires a mu-referenced parameterization, and does not yet support IOV. Example chain: c("saem", "focei"). "bayes" (also accepted as "bayesian" or "mcmc") runs full MCMC Bayesian estimation (Gibbs-within-HMC, NONMEM METHOD=BAYES parity): it returns posterior means with credible intervals and convergence diagnostics on $bayes rather than a point estimate, and runs standalone. Supports BSV and zero-mean inter-occasion variability (per-occasion kappa); the IOV variance posterior appears as OMEGA_IOV(...) in $bayes. SAEM fully supports inter-occasion variability (IOV / kappa) models.
covariance: Logical, or NULL (the default). Whether to compute the covariance step for standard errors. NULL uses the model file’s [fit_options] covariance (engine default TRUE when unset); a logical overrides the model file. Previously defaulted to TRUE and silently overrode a model file that set covariance = false (#558).
verbose: Logical, or NULL (the default). Print progress during estimation. NULL uses the model file’s [fit_options] verbose (engine default TRUE when unset); a logical overrides it.
bloq_method: Handling of observations outside quantification limits. NULL (default) keeps whatever the model file specified; "m3" enables Beal’s M3 likelihood (requires a CENS column in the data, with DV carrying the limit value: LLOQ on CENS=1 rows and ULOQ on CENS=-1 rows); "drop" disables M3 and treats censored rows as ordinary observations.
threads: Number of worker threads for the per-subject parallel loops in the Rust backend (inner EBE search, SAEM, SIR). NULL (default) leaves the backend’s thread pool at its default of one worker per logical CPU. Pass an integer to cap the pool - useful on shared machines, in CI, or to avoid SMT-induced contention (try parallel::detectCores(logical = FALSE)). The setting is per-call, so successive fits in the same R session can use different values.
mu_referencing: Logical, or NULL (the default). When TRUE, automatically detects mu-referencing from the model structure for faster and more accurate convergence. NULL uses the model file’s [fit_options] mu_referencing (engine default TRUE when unset). Applies to all estimation methods. Set to FALSE to disable for comparison purposes. Detection works automatically for standard parameterizations such as PARAM = THETA * exp(ETA); unusual parameterizations fall back silently to zero-centred ETA initialisation with no error. No changes to the .ferx model file are needed. Check fit$warnings to see which ETAs were detected.
sir: Logical, or NULL (the default); run Sampling Importance Resampling after the fit to produce non-parametric parameter uncertainty intervals. Requires covariance = TRUE. NULL uses the model file’s [fit_options] sir (engine default FALSE when unset). Tuning knobs (sir_samples, sir_resamples, sir_seed) still flow through settings.
gradient: Inner-loop (per-subject EBE) gradient method. One of "auto" or "fd", or NULL (the default) to use the model file’s [fit_options] gradient (engine default "auto" when unset). The legacy "ad" token is accepted by older model files but should not be used for new code.The inner optimizer is BFGS; what differs is how the gradient of the individual NLL w.r.t. ETA is computed:
"fd": central finite differences, 2 * n_eta forward evaluations per gradient call. Always available.
"auto" (default): let the engine choose the best available gradient path for the model. This is the right choice for almost all uses.When to set "fd" explicitly. Mainly for validation or for reproducibility against a known finite-difference baseline.Set FERX_TIME_GRADIENTS=1 in the environment to print per-gradient-call timings at the end of a fit, which is the easiest way to check which method is faster on your specific model/data.
optimizer_trace: Logical. If TRUE, write a per-iteration CSV trace to a temporary file and store its path in fit$trace_path. Pass the result to [ferx_trace](ferx_trace.qmd) or [ferx_plot_trace](ferx_plot_trace.qmd) to inspect optimizer progress. Default FALSE.
scale_params: Logical. If TRUE, apply a per-coordinate scaling layer on top of the existing log/Cholesky parameterization, dividing each transformed coordinate by |x0[i]| (when |x0[i]| > 0.1, otherwise by 1.0) so the outer optimizer works in a near-unit-magnitude space. Default FALSE.Why it defaults to FALSE. The scaling is not trajectory-transparent. Although it leaves the OFV value unchanged at any fixed point, it rescales the gradient the optimizer sees, and that gradient feeds the SLSQP overshoot cap, the quasi-Newton Hessian estimate, and the xtol/ftol termination - all of which act in the scaled coordinate system. The scaling layer only ever runs on log/Cholesky-packed coordinates (it auto-disables when any identity-packed theta is present), and for those, dividing by |log value| is counterproductive: a coordinate like log(V) = log(20) ~ 3 gets scale 3, so the optimizer’s unit step becomes a 3-unit move in log space - an e^3 ~ 20x multiplicative jump in V. That large step both overshoots and, through the uniform gradient cap, starves the step in every other dimension (notably OMEGA), so the fit can halt well short of the minimum. See ferx-core issue #99. The FALSE default reproduces the well-tested pre-scaling-layer behaviour.When to set TRUE. Left as an opt-in for experimentation - e.g. to A/B the scaled vs unscaled trajectory on a specific model. When enabled it applies to all outer optimizers: NLopt FOCE/FOCEI (BOBYQA, SLSQP, L-BFGS, MMA), the hand-rolled BFGS, Gauss-Newton / BHHH, and the SAEM M-step.
inits_from_nca: Derive NCA-based starting values from the data before the optimizer runs, overriding the model file’s defaults. Either a logical (FALSE, the default, disables it; TRUE is an alias for "nca_sweep") or one of "nca", "nca_sweep", "nca_ebe" to pick a strategy explicitly. Most useful with settings = list(optimizer = "trust_region") or method = "gn", where bad starting values can stall the optimizer. See [ferx_inits_from_nca](ferx_inits_from_nca.qmd) to inspect the values without fitting.
fd_hessian_step: Positive finite numeric. Relative step size for the finite-difference Hessian used in the covariance step (default 0.01). The actual perturbation for parameter i is fd_hessian_step * (1 + |x_hat[i]|). Increase (e.g. 0.1) when the fit warns about ill-conditioned Hessian entries; decrease (e.g. 1e-3) on smooth OFV surfaces where FD noise is the primary concern. Has no effect when covariance = FALSE.
settings: Optional named list of fine-grained options forwarded to the Rust FitOptions. Use this to tune knobs that do not have a dedicated ferx_fit() argument. Keys are validated: values that duplicate a dedicated argument (method, covariance, verbose, bloq_method, threads, sir) are rejected – pass them via the dedicated argument. Unknown keys and malformed values also raise an error.Precedence: dedicated ferx_fit() arguments win over settings, which in turn win over the model file’s [fit_options] block. A warning() is issued whenever a call-time value overrides a different value from [fit_options]. Inspect fit$model_file_settings and fit$call_settings to audit the full picture.Shared options (FOCE / FOCEI / GN / GN-hybrid / SAEM)
inner_maxiter: Per-subject EBE iteration cap (default 200).
inner_tol: Convergence tolerance for the inner EBE loop (default 1e-4).
fd_hessian_step: Also available as the dedicated fd_hessian_step argument above.
covariance_method: Covariance estimator, mirroring NONMEM $COV MATRIX=: "r" (inverse Hessian, the default), "s" (score cross-product), or "rsr" (the Huber-White sandwich, robust to model misspecification). Validated for FOCEI; "s" and "rsr" are rejected under non-interaction FOCE. No effect when covariance = FALSE.
covariance_fallback: "none" (default) or "sir". When the finite-difference Hessian is not positive definite, "sir" runs SIR with an absolute-eigenvalue-rectified proposal instead of leaving the covariance step failed; covariance_status is then "sir_fallback" and SIR-based credible intervals are reported.ODE models: RK45 solver tolerance
ode_reltol: RK45 relative tolerance for ODE models (default 1e-4; ignored for analytical PK). The default reproduces analytical closed forms in PRED to about 1e-4, but the FOCE objective amplifies solver error, so an ODE-form model’s OFV can differ from its analytical equivalent by several units. Set tighter (e.g. 1e-10) when the ODE-form OFV must match an analytical reference; expect slower fits. Can also be set in the model file’s [fit_options] block.
ode_abstol: RK45 absolute tolerance for ODE models (default 1e-6).
ode_max_steps: Maximum RK45 steps per integration segment (default 10000). Raise if a tight ode_reltol exhausts the step budget on stiff multi-compartment systems.FOCE / FOCEI / GN / GN-hybrid: iteration cap
maxiter: Maximum outer-optimizer iterations (default 500). Not applicable to SAEM, which controls iterations via n_exploration and n_convergence.FOCE / FOCEI / GN-hybrid: outer optimizer
optimizer: Population-level optimizer. One of "auto" (default; picks "nlopt_lbfgs" when the exact analytic FOCE/FOCEI gradient is available and "bobyqa" when only finite differences are – the fit reports the resolved pick as "auto (<resolved>)"), "bobyqa" (derivative-free, robust), "slsqp" (sequential quadratic programming, gradient-based), "lbfgs" / "nlopt_lbfgs" (limited-memory BFGS via NLopt, gradient-based), "mma" (method of moving asymptotes, gradient-based), "bfgs" (hand-rolled BFGS, gradient-based), or "trust_region" (trust-region Newton, gradient-based). Gradient-based optimizers use the inner gradient method set by the gradient argument; "bobyqa" does not. Not accepted by pure "gn".
outer_xtol: Relative step tolerance for the derivative-free "bobyqa" outer optimizer (NLopt xtol_rel; default 1e-4). Gradient-based optimizers use their own fixed internal tolerances. Can also be set in the model file’s [fit_options].
outer_ftol: Relative objective tolerance for the derivative-free "bobyqa" outer optimizer (NLopt ftol_rel). Unset = auto: 1e-8 for a pure time-to-event model (its hazard objective is evaluated exactly) and 1e-6 otherwise. The TTE tightening lands the frailty variance on the NONMEM/nlmixr2 minimum across a near-flat omega-squared ridge (ferx-core #469); the 1e-6 floor elsewhere avoids grinding on noisy ODE / FD-inner objectives, where 1e-8 is unreachable. Set an explicit value to pin it for every model.
global_search: Logical. When TRUE, run a global search phase before local refinement (default FALSE). Not accepted by pure "gn".
global_maxeval: Function evaluations budget for the global search phase (default 0, i.e. disabled when global_search = FALSE). Not accepted by pure "gn".
stagnation_guard: Logical (default TRUE). Terminates the NLopt outer loop early when the OFV plateau is numerically flat. Set FALSE to let SLSQP / L-BFGS run to their own xtol/ftol or maxiter. Not accepted by pure "gn".
reconverge_gradient_interval: Integer (default 0). How often to re-solve each subject’s inner EBE during the population gradient, instead of holding it fixed. 0 = never (cheap fixed-EBE gradient); 1 = every gradient evaluation; N = every N-th. Reconverging recovers the full gradient surface at roughly 5-6x the per-call cost and can help gradient-based optimizers (e.g. slsqp) past ill-conditioned non-IOV plateaus. IOV models always reconverge and ignore this setting. Not accepted by pure "gn".Trust-region optimizer (optimizer = "trust_region")
steihaug_max_iters: Conjugate-gradient iteration budget for the trust-region subproblem (default chosen by the engine). Only consumed when optimizer = "trust_region" is set under FOCE / FOCEI / GN-hybrid. Increase if the subproblem solver exits too early on ill-conditioned problems.SAEM
n_exploration: Stochastic exploration phase iterations (default 150).
n_convergence: Averaging / convergence phase iterations (default 250).
n_mh_steps: Metropolis-Hastings steps per subject per iteration (default 10).
adapt_interval: How often (in iterations) the MH proposal covariance is adapted (default 50).
omega_burnin: Initial iterations during which Omega is held fixed while the sampler warms up (default 20). Guards against Omega collapse on sparse data. Set to 0 to disable.
seed / saem_seed: RNG seed for the SAEM Metropolis-Hastings sampler (default 12345). Independent of multi_start_seed.
n_leapfrog / saem_n_leapfrog: Leapfrog steps for HMC proposals (default 0 = Metropolis-Hastings). A positive value replaces MH with one HMC proposal per subject per iteration.Bayes ("bayes")
bayes_warmup: Warmup (burn-in + adaptation) sweeps per chain, discarded from the posterior (default 1000).
bayes_iters: Retained sampling sweeps per chain, before thinning (default 1000).
bayes_chains: Number of independent chains (default 4); used for split-R-hat.
bayes_thin: Keep every bayes_thin-th sampling draw (default 1).
bayes_seed: Base RNG seed for the Bayes sampler. Independent of seed / saem_seed.Gauss-Newton ("gn" / "gn_hybrid")
gn_lambda: Levenberg-Marquardt damping factor (default 0.01). Larger values make steps more conservative. Accepted by both "gn" and "gn_hybrid".Importance Sampling ("imp")By default "imp" is a Monte-Carlo EM estimator (NONMEM METHOD=IMP); set imp_eval_only = TRUE to evaluate the marginal -2 log L at fixed parameters instead (NONMEM EONLY=1).
imp_eval_only: Logical; TRUE evaluates -2 log L at the fixed input parameters without estimating (NONMEM EONLY=1; must be the terminal chain stage). FALSE (default) estimates.
imp_iterations: Number of Monte-Carlo EM iterations, ignored when imp_eval_only (default 200).
imp_averaging: Number of final iterations whose parameters are averaged into the reported estimate, ignored when imp_eval_only (default 50).
imp_samples: Importance samples drawn per subject (default 1000). Halving the Monte-Carlo SE requires quadrupling this value.
imp_proposal_df: Degrees of freedom for the Student-t proposal distribution (default 5); the string "normal" (or "mvn") selects a multivariate-normal proposal.
imp_seed: RNG seed for the IS step (default chosen by the engine).
imp_low_ess_threshold: ESS fraction below which a subject is flagged in fit$importance_sampling$low_ess_subject_ids (default 0.1, i.e. 10% of imp_samples).IMPMAP ("impmap" estimator)
impmap_iterations: Number of Monte-Carlo EM iterations (default 200).
impmap_samples: Importance samples drawn per subject per iteration (default 300).
impmap_proposal_df: Proposal degrees of freedom. The string "normal" (default) selects a multivariate-normal proposal (the NONMEM default); a number >= 1 selects a heavier-tailed Student-t.
impmap_averaging: Number of final iterations whose parameters are averaged into the reported estimate (default 50).
impmap_seed: RNG seed for the IMPMAP sampling (default chosen by the engine).
impmap_low_ess_threshold: ESS fraction below which a subject is flagged as poorly sampled (default 0.1).
impmap_trace: Logical; when TRUE, collect per-iteration parameter values into fit$impmap_trace (analogous to NONMEM .ext output). Default FALSE.SIR (Sampling Importance Resampling)
sir_samples: Candidate draws for SIR (default 1000).
sir_resamples: Resampled draws retained for CI computation (default 250).
sir_seed: RNG seed for the SIR step (default chosen by the engine). Independent of seed / saem_seed.Multi-start optimization
n_starts: Number of optimizer starts (default 1, i.e. single start). When > 1, starts 2..n are initialized from log-space perturbations of the model’s initial values; the best OFV wins.
start_sigma: Perturbation spread in log-space (default 0.3, approx. 30% CV). Log-packed thetas are multiplied by exp(N(0, start_sigma)); identity-packed thetas are shifted by start_sigma * N(0,1).
multi_start_seed: RNG seed for the start-point perturbation (default 42). Independent of seed / saem_seed.
output: Optional path to a .fitrx file. When non-NULL, [ferx_save_fit](ferx_save_fit.qmd) is invoked on the result so the fit is persisted to disk in a portable, cross-language bundle (zip of JSON + CSV) before the function returns. Equivalent to calling ferx_save_fit(fit, output) after the fit. See the format reference in the ferx-core docs for the on-disk schema.
include_data: Logical. Only meaningful with output. When TRUE, embeds the input data CSV verbatim inside the .fitrx bundle so the file is self-contained. Default FALSE.
ignore: Character vector of filter expressions that exclude a record when the expression evaluates to true (NONMEM $DATA IGNORE= equivalent). Expressions use standard comparison operators (==, !=, <, <=, >, >=) on column names. Use && within one expression for AND; for OR use multiple elements. These conditions are merged with any [data_selection] ignore rules in the model file. Example: ignore = c("DV < 0.001", "EVID != 0 && MDV == 1").
accept: Character vector of filter expressions. A record is kept only when all conditions pass; excluded if any fail (NONMEM $DATA ACCEPT= equivalent). Merged with model-file accept rules.
ignore_ids: Numeric or character vector of subject IDs to exclude entirely. Sugar for ignore = "ID == <id>" applied per-subject. Merged with [data_selection] ignore_subjects from the model file.
...: Reserved for future use. Unrecognised arguments raise an error.

Process noise (SDE / diffusion)

ODE-based PK/PD models occasionally produce autocorrelated IWRES when the structural model is misspecified (missing compartment, unmodelled feedback). Adding continuous within-subject process noise via an SDE framework - solved by an Extended Kalman Filter (EKF) - absorbs this drift and yields a better-calibrated likelihood. Add a [diffusion] block to the .ferx model file to enable SDE mode. Each line declares the diffusion variance for one ODE state:

[diffusion]
  central ~ 0.5   # initial estimate for DIFF_CENTRAL (variance units)

Key points:

The declared value and the fitted estimate are variances, not standard deviations. A value of 0.5 means $\sigma^2 = 0.5$, not $\sigma = 0.5$.
Each diffusion parameter appears in fit$theta as DIFF_<STATE> (e.g. DIFF_CENTRAL). Standard errors and ferx_estimates() treat them as regular thetas.
SDE models use finite differences for the EKF covariance propagation.
SAEM is not supported with SDE models; a hard error is raised.
fit$uses_sde is TRUE whenever a [diffusion] block was present.

c(“## Unit conversion and scaling (an optional [scaling] block to the .ferx model file tomodel predictions before comparing them to the observations in DV.is useful when the model is parameterised in one unit (e.g. amounts in) but DV is recorded in another (e.g. concentrations in ng/mL).engine divides each prediction by the scale factor before computing, so the fitted sigma is on the DV scale.Form A – scalar divisor (safe with AD):\n[scaling]\n obs_scale = 1000 # divide every predicted value by 1000\nForm B – expression divisor (forces gradient = fd):\n[scaling]\n obs_scale = V # divide by the individual volume (theta/eta expression)\nmay reference thetas, etas, and [individual_parameters]variables. AD gradients are not supported for Form B; addgradient = fd to [fit_options] or the engine will error.Per-CMT scaling (Form A or B per compartment):\n[scaling]\n obs_scale[CMT=1] = 1000 # CMT 1 in ng/mL, dose in ug\n obs_scale[CMT=2] = 1 # CMT 2 already in correct units\nForm C – ODE readout expression (forces gradient = fd):For ODE models, y = <expr> defines the observation as a function ofstate variables, thetas, etas, and individual parameters. This replacesobs_cmt= argument in [structural_model] and is requiredthe observation is not a raw compartment amount:\n[structural_model]\n ode(states = [depot, central], ...) # no obs_cmt= here\n\n[scaling]\n y = central / V # observation = central compartment / volume\n-CMT Form C: y[CMT=1] = central / V1, y[CMT=2] = central2 / V2.Constraints:[scaling] is not yet supported on SDE ([diffusion]) models.Forms B, per-CMT, and Form C force finite-difference gradients; addgradient = fd to [fit_options] explicitly, or theraises an informative error.”, ”## list("[scaling]")an optional [scaling] block to the .ferx model file tomodel predictions before comparing them to the observations in DV.is useful when the model is parameterised in one unit (e.g. amounts in) but DV is recorded in another (e.g. concentrations in ng/mL).engine divides each prediction by the scale factor before computing, so the fitted sigma is on the DV scale.Form A – scalar divisor (safe with AD):\n[scaling]\n obs_scale = 1000 # divide every predicted value by 1000\nForm B – expression divisor (forces gradient = fd):\n[scaling]\n obs_scale = V # divide by the individual volume (theta/eta expression)\nmay reference thetas, etas, and [individual_parameters]variables. AD gradients are not supported for Form B; addgradient = fd to [fit_options] or the engine will error.Per-CMT scaling (Form A or B per compartment):\n[scaling]\n obs_scale[CMT=1] = 1000 # CMT 1 in ng/mL, dose in ug\n obs_scale[CMT=2] = 1 # CMT 2 already in correct units\nForm C – ODE readout expression (forces gradient = fd):For ODE models, y = <expr> defines the observation as a function ofstate variables, thetas, etas, and individual parameters. This replacesobs_cmt= argument in [structural_model] and is requiredthe observation is not a raw compartment amount:\n[structural_model]\n ode(states = [depot, central], ...) # no obs_cmt= here\n\n[scaling]\n y = central / V # observation = central compartment / volume\n-CMT Form C: y[CMT=1] = central / V1, y[CMT=2] = central2 / V2.Constraints:[scaling] is not yet supported on SDE ([diffusion]) models.Forms B, per-CMT, and Form C force finite-difference gradients; addgradient = fd to [fit_options] explicitly, or theraises an informative error.”, “## block)an optional [scaling] block to the .ferx model file tomodel predictions before comparing them to the observations in DV.is useful when the model is parameterised in one unit (e.g. amounts in) but DV is recorded in another (e.g. concentrations in ng/mL).engine divides each prediction by the scale factor before computing, so the fitted sigma is on the DV scale.Form A – scalar divisor (safe with AD):\n[scaling]\n obs_scale = 1000 # divide every predicted value by 1000\nForm B – expression divisor (forces gradient = fd):\n[scaling]\n obs_scale = V # divide by the individual volume (theta/eta expression)\nmay reference thetas, etas, and [individual_parameters]variables. AD gradients are not supported for Form B; addgradient = fd to [fit_options] or the engine will error.Per-CMT scaling (Form A or B per compartment):\n[scaling]\n obs_scale[CMT=1] = 1000 # CMT 1 in ng/mL, dose in ug\n obs_scale[CMT=2] = 1 # CMT 2 already in correct units\nForm C – ODE readout expression (forces gradient = fd):For ODE models, y = <expr> defines the observation as a function ofstate variables, thetas, etas, and individual parameters. This replacesobs_cmt= argument in [structural_model] and is requiredthe observation is not a raw compartment amount:\n[structural_model]\n ode(states = [depot, central], ...) # no obs_cmt= here\n\n[scaling]\n y = central / V # observation = central compartment / volume\n-CMT Form C: y[CMT=1] = central / V1, y[CMT=2] = central2 / V2.Constraints:[scaling] is not yet supported on SDE ([diffusion]) models.Forms B, per-CMT, and Form C force finite-difference gradients; addgradient = fd to [fit_options] explicitly, or theraises an informative error.” )c(”## Parameter declaration syntax (Theta (fixed effects):\ntheta CL(0.134, 0.001, 10.0) # (initial, lower, upper)\n theta CL(0.1, 0.001) # lower-bound only (no upper bound)\n theta CL(0.134, 0.001, 10.0) (FIX) # FIX at end (traditional)\n theta CL (FIX) (0.134, 0.001, 10.0) # FIX anywhere (flexible placement)\nOmega (inter-individual variability):\nomega ETA_CL ~ 0.07 # variance parameterisation (default)\n omega ETA_CL ~ 0.07 (FIX) # fixed omega, FIX at end\n omega ETA_CL (FIX) ~ 0.07 # fixed omega, FIX before the tilde\nsame flexible (FIX) placement applies to sigma andkappa (IOV) declarations.Unused-parameter warning: a warning severity message with\"unused_parameter\" is emitted by the parser when ais declared in [parameters] but never referenced in[individual_parameters] or [error_model]. This usuallya commented-out expression or a typo in the parameter name.ferx_warnings(fit) or check ferx_model_validate()before fitting.”, “## list("[parameters]")Theta (fixed effects):\ntheta CL(0.134, 0.001, 10.0) # (initial, lower, upper)\n theta CL(0.1, 0.001) # lower-bound only (no upper bound)\n theta CL(0.134, 0.001, 10.0) (FIX) # FIX at end (traditional)\n theta CL (FIX) (0.134, 0.001, 10.0) # FIX anywhere (flexible placement)\nOmega (inter-individual variability):\nomega ETA_CL ~ 0.07 # variance parameterisation (default)\n omega ETA_CL ~ 0.07 (FIX) # fixed omega, FIX at end\n omega ETA_CL (FIX) ~ 0.07 # fixed omega, FIX before the tilde\nsame flexible (FIX) placement applies to sigma andkappa (IOV) declarations.Unused-parameter warning:* a warning severity message with\"unused_parameter\" is emitted by the parser when ais declared in [parameters] but never referenced in[individual_parameters] or [error_model]. This usuallya commented-out expression or a typo in the parameter name.ferx_warnings(fit) or check ferx_model_validate()before fitting.”, “## block)Theta (fixed effects):\ntheta CL(0.134, 0.001, 10.0) # (initial, lower, upper)\n theta CL(0.1, 0.001) # lower-bound only (no upper bound)\n theta CL(0.134, 0.001, 10.0) (FIX) # FIX at end (traditional)\n theta CL (FIX) (0.134, 0.001, 10.0) # FIX anywhere (flexible placement)\nOmega (inter-individual variability):\nomega ETA_CL ~ 0.07 # variance parameterisation (default)\n omega ETA_CL ~ 0.07 (FIX) # fixed omega, FIX at end\n omega ETA_CL (FIX) ~ 0.07 # fixed omega, FIX before the tilde\nsame flexible (FIX) placement applies to sigma andkappa (IOV) declarations.Unused-parameter warning:* a warning severity message with\"unused_parameter\" is emitted by the parser when ais declared in [parameters] but never referenced in[individual_parameters] or [error_model]. This usuallya commented-out expression or a typo in the parameter name.ferx_warnings(fit) or check ferx_model_validate()before fitting.” )c(“## Steady-state dosing (a dosing row as a steady-state dose by setting SS = 1 (orSS = 2) in the NONMEM CSV and providing the dosing intervalII (in the same time units as TIME).SS = 1: the subject is assumed to be at steadybefore this dose. The engine computes the steady-stateconditions analytically (for 1/2/3-cpt models) or viaexpansion (for ODE models) and uses them as the startingstate.SS = 2: the steady-state concentration is addedthe current compartment state (superposition).II: dosing interval (required when SS > 0).match the units of TIME. Rows with SS = 0 or missingSS ignore II.changes to the .ferx model file are needed. Steady-stateis purely data-driven and works for all PK model families(1-cpt, 2-cpt, 3-cpt oral/IV/infusion, ODE-based).”, “## list("SS")a dosing row as a steady-state dose by setting SS = 1 (orSS = 2) in the NONMEM CSV and providing the dosing intervalII (in the same time units as TIME).SS = 1: the subject is assumed to be at steadybefore this dose. The engine computes the steady-stateconditions analytically (for 1/2/3-cpt models) or viaexpansion (for ODE models) and uses them as the startingstate.SS = 2: the steady-state concentration is addedthe current compartment state (superposition).II: dosing interval (required when SS > 0).match the units of TIME. Rows with SS = 0 or missingSS ignore II.changes to the .ferx model file are needed. Steady-stateis purely data-driven and works for all PK model families(1-cpt, 2-cpt, 3-cpt oral/IV/infusion, ODE-based).”, “## anda dosing row as a steady-state dose by setting SS = 1 (orSS = 2) in the NONMEM CSV and providing the dosing intervalII (in the same time units as TIME).SS = 1: the subject is assumed to be at steadybefore this dose. The engine computes the steady-stateconditions analytically (for 1/2/3-cpt models) or viaexpansion (for ODE models) and uses them as the startingstate.SS = 2: the steady-state concentration is addedthe current compartment state (superposition).II: dosing interval (required when SS > 0).match the units of TIME. Rows with SS = 0 or missingSS ignore II.changes to the .ferx model file are needed. Steady-stateis purely data-driven and works for all PK model families(1-cpt, 2-cpt, 3-cpt oral/IV/infusion, ODE-based).”, “## list("II")a dosing row as a steady-state dose by setting SS = 1 (orSS = 2) in the NONMEM CSV and providing the dosing intervalII (in the same time units as TIME).SS = 1: the subject is assumed to be at steadybefore this dose. The engine computes the steady-stateconditions analytically (for 1/2/3-cpt models) or viaexpansion (for ODE models) and uses them as the startingstate.SS = 2: the steady-state concentration is addedthe current compartment state (superposition).II: dosing interval (required when SS > 0).match the units of TIME. Rows with SS = 0 or missingSS ignore II.changes to the .ferx model file are needed. Steady-stateis purely data-driven and works for all PK model families(1-cpt, 2-cpt, 3-cpt oral/IV/infusion, ODE-based).”, “## columns)a dosing row as a steady-state dose by setting SS = 1 (orSS = 2) in the NONMEM CSV and providing the dosing intervalII (in the same time units as TIME).SS = 1: the subject is assumed to be at steadybefore this dose. The engine computes the steady-stateconditions analytically (for 1/2/3-cpt models) or viaexpansion (for ODE models) and uses them as the startingstate.SS = 2: the steady-state concentration is addedthe current compartment state (superposition).II: dosing interval (required when SS > 0).match the units of TIME. Rows with SS = 0 or missingSS ignore II.changes to the .ferx model file are needed. Steady-stateis purely data-driven and works for all PK model families(1-cpt, 2-cpt, 3-cpt oral/IV/infusion, ODE-based).” )## Specifying model and data

There are two equivalent ways to supply the model and data: 1. Path strings (classic style):

ferx_fit("pk.ferx", "data.csv")
ferx_fit(model = "pk.ferx", data = "data.csv")

2. ferx_model object (pipe style):

"data.csv" |> ferx_model("pk.ferx") |> ferx_fit()

Both dispatch to the same Rust backend. The ferx_model form is convenient when combined with [ferx_set_section](ferx_set_section.qmd) to modify model options in the same chain (see Examples). The data path stored in the ferx_model object can always be overridden by supplying dataexplicitly to ferx_fit().

Controlling estimation

Estimation method - pass a single method or a vector to chain methods in sequence (each stage seeds the next with its converged parameters):

ferx_fit(m, d, method = "focei")
ferx_fit(m, d, method = c("saem", "focei"))  # SAEM warm-start, FOCEI polish

Standard errors - the covariance step is on by default:

ferx_fit(m, d, covariance = TRUE)   # default: produces SE / 
ferx_fit(m, d, covariance = FALSE)  # skip for speed during development

# Tune the FD step for the Hessian (default 0.01):
ferx_fit(m, d, fd_hessian_step = 0.1)    # increase when Hessian is ill-conditioned
ferx_fit(m, d, fd_hessian_step = 1e-3)   # decrease for very smooth surfaces

Parallelism - cap the Rust thread pool:

ferx_fit(m, d, threads = parallel::detectCores(logical = FALSE))

LOQ censoring:

ferx_fit(m, d, bloq_method = "m3")    # M3 likelihood (CENS: 1=LLOQ, -1=ULOQ)
ferx_fit(m, d, bloq_method = "drop")  # disable M3; use rows as observed

Gradient method for the inner EBE loop:

ferx_fit(m, d, gradient = "auto")  # default: engine-selected
ferx_fit(m, d, gradient = "fd")    # force finite differences

Optimizer trace - write a per-iteration CSV for convergence diagnostics:

fit <- ferx_fit(m, d, optimizer_trace = TRUE)
ferx_plot_trace(fit)

c(“## Fine-tuning withsettings argument forwards a named list of low-level options toRust backend without requiring a new wrapper release. Arguments withparameters (e.g. method, covariance) cannot bein settings - pass them via the named argument.the complete settings reference - including which options apply to which, valid combinations, and a convergence troubleshooting guide - see(https://ferx-nlme.github.io/model-dsl/fit-options.html).Outer optimizer selection:\nferx_fit(m, d, settings = list(optimizer = \"auto\")) # default\nferx_fit(m, d, settings = list(optimizer = \"bobyqa\")) # derivative-free\nferx_fit(m, d, settings = list(optimizer = \"slsqp\")) # gradient-based\nferx_fit(m, d, settings = list(optimizer = \"lbfgs\"))\nferx_fit(m, d, settings = list(optimizer = \"bfgs\"))\nferx_fit(m, d, settings = list(optimizer = \"trust_region\"))\nferx_fit(m, d, settings = list(optimizer = \"mma\"))\nIteration cap and inner loop:\nferx_fit(m, d, settings = list(\n maxiter = 500L,\n inner_maxiter = 100L,\n inner_tol = 1e-6\n))\nSAEM tuning:\nferx_fit(m, d, method = \"saem\", settings = list(\n n_exploration = 200,\n n_convergence = 400,\n n_mh_steps = 3,\n omega_burnin = 20L,\n seed = 42L\n))\n\n# HMC proposals:\nferx_fit(m, d, method = \"saem\", settings = list(n_leapfrog = 3L))\nSIR uncertainty (requires sir = TRUE, covariance = TRUE):\nferx_fit(m, d, sir = TRUE, settings = list(\n sir_samples = 2000L,\n sir_resamples = 500L,\n sir_seed = 1L\n))\nGlobal search before local refinement:\nferx_fit(m, d, settings = list(\n global_search = TRUE,\n global_maxeval = 1000L\n))\nTrust-region CG budget:\nferx_fit(m, d, settings = list(\n optimizer = \"trust_region\",\n steihaug_max_iters = 100L\n))\nConvergence tolerance for difficult subjects:max_unconverged_frac (numeric in [0, 1]) flags the fit as convergedif up to this fraction of subjects failed their inner EBE loop, insteadraising an error - useful for large datasets with a handful of difficult. min_obs_for_convergence_check (non-negative integer) is thenumber of observations a subject must have before its inner-loopcounts toward that fraction; sparser subjects are excluded.\nferx_fit(m, d, settings = list(\n max_unconverged_frac = 0.1,\n min_obs_for_convergence_check = 2L\n))\nStagnation guard (NLopt-based optimizers):NLopt-based outer optimizers (BOBYQA, SLSQP, L-BFGS, MMA) short-circuit byonce recent evaluations show no OFV improvement above 1e-3 over awindow, letting them terminate quickly instead of burning throughmaxiter at full inner-loop cost. Set stagnation_guard = FALSEto disable this and let the optimizer run to its natural termination- useful when debugging or for problems with genuinely-but-real OFV improvements.\nferx_fit(m, d, settings = list(stagnation_guard = FALSE))\nMulti-start optimization:Runs n_starts independent fits from perturbed initial values in(via rayon) and returns the converged result with the lowest OFV.n_starts = 1 disables multi-start (single run, no overhead).for models prone to local minima: Michaelis-Menten elimination,-block omega, or many covariate parameters. On an 8-core machinen_starts = 8 costs approximately the same wall-clock time as a single run. always uses the exact initial values from the model file...n apply a log-space perturbation of size start_sigma(default 0.3) to each theta.\n# 8 parallel starts (approx. same wall time as one run on 8 cores)\nferx_fit(m, d, settings = list(n_starts = 8L))\n\n# Wider perturbation for ridge-shaped surfaces (e.g. Michaelis-Menten):\nferx_fit(m, d, settings = list(n_starts = 8L, start_sigma = 0.5))\n\n# Pin the perturbation RNG seed for reproducibility (independent of the SAEM seed):\nferx_fit(m, d, settings = list(n_starts = 8L, multi_start_seed = 123L))\n”, ”## list("settings")settings argument forwards a named list of low-level options toRust backend without requiring a new wrapper release. Arguments withparameters (e.g. method, covariance) cannot bein settings - pass them via the named argument.the complete settings reference - including which options apply to which, valid combinations, and a convergence troubleshooting guide - see(https://ferx-nlme.github.io/model-dsl/fit-options.html).Outer optimizer selection:\nferx_fit(m, d, settings = list(optimizer = \"auto\")) # default\nferx_fit(m, d, settings = list(optimizer = \"bobyqa\")) # derivative-free\nferx_fit(m, d, settings = list(optimizer = \"slsqp\")) # gradient-based\nferx_fit(m, d, settings = list(optimizer = \"lbfgs\"))\nferx_fit(m, d, settings = list(optimizer = \"bfgs\"))\nferx_fit(m, d, settings = list(optimizer = \"trust_region\"))\nferx_fit(m, d, settings = list(optimizer = \"mma\"))\nIteration cap and inner loop:\nferx_fit(m, d, settings = list(\n maxiter = 500L,\n inner_maxiter = 100L,\n inner_tol = 1e-6\n))\nSAEM tuning:\nferx_fit(m, d, method = \"saem\", settings = list(\n n_exploration = 200,\n n_convergence = 400,\n n_mh_steps = 3,\n omega_burnin = 20L,\n seed = 42L\n))\n\n# HMC proposals:\nferx_fit(m, d, method = \"saem\", settings = list(n_leapfrog = 3L))\nSIR uncertainty (requires sir = TRUE, covariance = TRUE):\nferx_fit(m, d, sir = TRUE, settings = list(\n sir_samples = 2000L,\n sir_resamples = 500L,\n sir_seed = 1L\n))\nGlobal search before local refinement:\nferx_fit(m, d, settings = list(\n global_search = TRUE,\n global_maxeval = 1000L\n))\nTrust-region CG budget:\nferx_fit(m, d, settings = list(\n optimizer = \"trust_region\",\n steihaug_max_iters = 100L\n))\nConvergence tolerance for difficult subjects:max_unconverged_frac (numeric in [0, 1]) flags the fit as convergedif up to this fraction of subjects failed their inner EBE loop, insteadraising an error - useful for large datasets with a handful of difficult. min_obs_for_convergence_check (non-negative integer) is thenumber of observations a subject must have before its inner-loopcounts toward that fraction; sparser subjects are excluded.\nferx_fit(m, d, settings = list(\n max_unconverged_frac = 0.1,\n min_obs_for_convergence_check = 2L\n))\nStagnation guard (NLopt-based optimizers):NLopt-based outer optimizers (BOBYQA, SLSQP, L-BFGS, MMA) short-circuit byonce recent evaluations show no OFV improvement above 1e-3 over awindow, letting them terminate quickly instead of burning throughmaxiter at full inner-loop cost. Set stagnation_guard = FALSEto disable this and let the optimizer run to its natural termination- useful when debugging or for problems with genuinely-but-real OFV improvements.\nferx_fit(m, d, settings = list(stagnation_guard = FALSE))\nMulti-start optimization:Runs n_starts independent fits from perturbed initial values in(via rayon) and returns the converged result with the lowest OFV.n_starts = 1 disables multi-start (single run, no overhead).for models prone to local minima: Michaelis-Menten elimination,-block omega, or many covariate parameters. On an 8-core machinen_starts = 8 costs approximately the same wall-clock time as a single run. always uses the exact initial values from the model file...n apply a log-space perturbation of size start_sigma(default 0.3) to each theta.\n# 8 parallel starts (approx. same wall time as one run on 8 cores)\nferx_fit(m, d, settings = list(n_starts = 8L))\n\n# Wider perturbation for ridge-shaped surfaces (e.g. Michaelis-Menten):\nferx_fit(m, d, settings = list(n_starts = 8L, start_sigma = 0.5))\n\n# Pin the perturbation RNG seed for reproducibility (independent of the SAEM seed):\nferx_fit(m, d, settings = list(n_starts = 8L, multi_start_seed = 123L))\n” )## Post-fit outputs and pipe extensions

ferx_fit() returns a ferx_fit object. All of the following work both as standalone calls and as the tail of a |> pipe:

fit |> print()              # full parameter table
fit |> summary()            # compact diagnostic summary
fit |> ferx_estimates()     # tidy data frame: theta / omega / sigma + SE / 
fit |> ferx_cor_matrix()    # parameter correlation matrix (needs covariance = TRUE)
fit |> ferx_model_inspect() # model structure auto-derived by the engine
fit |> ferx_plot_trace()    # convergence trace (needs optimizer_trace = TRUE)

Diagnostics data frame (PRED, IPRED, CWRES, ETAs, …) lives in fit$sdtab and can be used directly:

fit$sdtab
fit$ebe_etas
fit$individual_estimates

Fit options reference - full documentation of every [fit_options] key and settings knob, organised by method with a convergence-troubleshooting guide and a method x option compatibility table.
[ferx_fit_async](ferx_fit_async.qmd) - non-blocking version for long runs.
[ferx_check_init](ferx_check_init.qmd) - 5-iteration pilot to validate starting values before a full run.
[ferx_inits_from_nca](ferx_inits_from_nca.qmd) - NCA-derived starting values.
[ferx_warnings](ferx_warnings.qmd) - structured warnings from the fit.
[ferx_estimates](ferx_estimates.qmd) - tidy parameter table with SE /
[ferx_plot_trace](ferx_plot_trace.qmd) - convergence trace plot.

Other fitting: [ferx_check_init](ferx_check_init.qmd)(), [ferx_collect](ferx_collect.qmd)(), [ferx_fit_async](ferx_fit_async.qmd)(), [ferx_inits_from_nca](ferx_inits_from_nca.qmd)(), [ferx_sir](ferx_sir.qmd)()

Concept

fitting

Value

A named list. Where to find commonly-needed quantities:

What you want	Accessor	Shape	When present
Residuals, PRED, IPRED, diagnostics	`fit$sdtab`	one row per observation	always
Covariates echoed per observation (via `[output]`)	`fit$sdtab`	one row per observation, LOCF	declared in `[output]`
Raw covariate values for all dataset records	`fit$covtab`	one row per dataset record (doses + obs)	model has `[covariates]` block
ETA / EBE values per observation row	`fit$sdtab` (`ETA_CL`, `ETA_V`, …)	one row per observation, value repeated	always
ETA / EBE values per subject	`fit$ebe_etas`	one row per subject	model declares etas
Individual PK parameters per subject	`fit$individual_estimates`	one row per subject	always

Full slot descriptions:

Examples

ex <- ferx_example("warfarin")
fit <- ferx_fit(ex$model, ex$data, method = "gn", covariance = FALSE)
fit$theta
fit$ofv


ex <- ferx_example("warfarin")

# -- Classic style (path strings) ------------------------------------------

# Minimal call: FOCEI with default optimizer, covariance step on
fit <- ferx_fit(ex$model, ex$data)

# Named arguments (fully equivalent)
fit <- ferx_fit(model = ex$model, data = ex$data, method = "focei")

# All common options at once
fit <- ferx_fit(ex$model, ex$data,
  method      = "focei",
  covariance  = TRUE,
  threads     = 4L,
  gradient    = "auto",
  verbose     = TRUE,
  scale_params = FALSE
)

# SAEM warm-start, then FOCEI polish
fit <- ferx_fit(ex$model, ex$data, method = c("saem", "focei"))

# Fine-tune via settings
fit <- ferx_fit(ex$model, ex$data,
  method   = "focei",
  settings = list(
    optimizer     = "slsqp",
    maxiter       = 500L,
    inner_maxiter = 100L,
    inner_tol     = 1e-6
  )
)

# LOQ-censored observations (M3 method)
bloq <- ferx_example("warfarin_bloq")
fit  <- ferx_fit(bloq$model, bloq$data, method = "focei", bloq_method = "m3")

# SIR parameter uncertainty
fit <- ferx_fit(ex$model, ex$data,
  sir      = TRUE,
  settings = list(sir_samples = 2000L, sir_resamples = 500L)
)

# -- Pipe style (ferx_model object) -----------------------------------------

# Basic pipe: data flows into ferx_model() which bundles the model path
ex$data |>
  ferx_model(ex$model) |>
  ferx_fit(method = "focei", covariance = TRUE)

# Modify a model section, then fit
ex$data |>
  ferx_model(ex$model) |>
  ferx_set_section("fit_options", c(
    "  method     = focei",
    "  maxiter    = 500",
    "  covariance = true"
  )) |>
  ferx_fit()

# Full pipeline including post-fit outputs
ex$data |>
  ferx_model(ex$model) |>
  ferx_set_section("fit_options", c("  method = focei")) |>
  ferx_fit(covariance = TRUE, threads = 4L,
           settings = list(optimizer = "slsqp")) |>
  summary()

# Inspect then continue (ferx_get_section returns the ferx_model invisibly)
ex$data |>
  ferx_model(ex$model) |>
  ferx_get_section("parameters") |>
  ferx_fit() |>
  ferx_estimates()

# Override data stored in ferx_model at fit time
# (substitute the path to your own dataset for "other_cohort.csv")
m <- ferx_model(ex$data, ex$model)
ferx_fit(m, data = "other_cohort.csv", method = "focei")

# -- Post-fit outputs -------------------------------------------------------

fit <- ferx_fit(ex$model, ex$data, covariance = TRUE, optimizer_trace = TRUE)

summary(fit)              # compact diagnostic table
ferx_estimates(fit)       # tidy data frame with SE and %RSE
ferx_cor_matrix(fit)      # parameter correlation matrix
ferx_model_inspect(fit)   # model structure auto-derived by the engine
ferx_plot_trace(fit)      # convergence plot (optimizer_trace = TRUE required)

fit$sdtab                 # per-observation diagnostics (PRED, IPRED, CWRES, etc.)
fit$ebe_etas              # per-subject empirical Bayes ETAs
fit$individual_estimates  # per-subject individual PK parameters
fit$eigenvalues           # sorted eigenvalues of parameter correlation matrix
fit$condition_number      # > 1000 flags potential ill-conditioning

# Covariance diagnostics in a pipe
ferx_fit(ex$model, ex$data, covariance = TRUE) |> ferx_cor_matrix()

# -- SDE model (Extended Kalman Filter) -------------------------------------

# The [diffusion] block in the .ferx file enables SDE mode.
# DIFF_CENTRAL is a within-subject process-noise variance on the central
# compartment.  SDE models use finite differences for this step.
sde <- ferx_example("warfarin_sde")
fit_sde <- ferx_fit(sde$model, sde$data)
fit_sde$uses_sde                  # TRUE
fit_sde$theta["DIFF_CENTRAL"]     # fitted diffusion variance
ferx_estimates(fit_sde)           # DIFF_CENTRAL appears in theta block

# -- Multi-start (avoid local minima) ----------------------------------------
ex <- ferx_example("warfarin")
fit_ms <- ferx_fit(ex$model, ex$data, settings = list(n_starts = 4L))
fit_ms$ofv

# -- NCA-based starting values (inits_from_nca) ------------------------------
# `inits_from_nca` is designed to mitigate stalling and local-minimum traps
# in **gradient-based estimation methods**, which navigate the likelihood
# surface using a local Jacobian / Hessian and so are sensitive to where the
# starting thetas sit. In ferx those are:
#   - method = "gn"         (FOCE Gauss-Newton)
#   - method = "gn_hybrid"  (FOCE-GN with SLSQP polish)
#   - method = "foce" / "focei" run with settings = list(optimizer =
#     "trust_region") or settings = list(optimizer = "slsqp") (the
#     default "auto" resolves to derivative-free "bobyqa" on these
#     poorly-started ODE/PD fits, which tolerates poor starts the best;
#     switch to slsqp / trust_region for gradient-based behaviour, then
#     NCA-derived starts help most)
# For these, NCA-derived starts often turn a non-converging or stagnating
# fit into a clean convergence.
#
# Stochastic / sampling-based methods ("saem", "imp") explore the space
# globally and are far less sensitive to starting values, so `inits_from_nca`
# gives a smaller benefit. The better tool for them is multi-start (run
# several fits from perturbed starts and keep the lowest OFV) - see the
# Multi-start section above (`settings = list(n_starts = 4L)`), or chain
# SAEM into FOCEI: `method = c("saem", "focei")`.

# TRUE = default strategy ("nca_sweep"); rescues a trust_region fit that
# would otherwise stall on poor defaults.
fit <- ferx_fit(ex$model, ex$data,
  settings       = list(optimizer = "trust_region"),
  inits_from_nca = TRUE
)

# Pick a strategy explicitly. "nca_ebe" handles large between-subject
# variability better than "nca_sweep" but falls back to it for ODE models.
fit <- ferx_fit(ex$model, ex$data, method = "gn",
                inits_from_nca = "nca_ebe")

# Fastest variant: NCA arithmetic only, no grid sweep.
fit <- ferx_fit(ex$model, ex$data, inits_from_nca = "nca")

# Same flag from the model file's [fit_options] (the call-time arg wins
# and warns on conflict). Inspect what NCA produces without fitting via
# `ferx_inits_from_nca()` -- see ?ferx_inits_from_nca.
#
# Combine with multi-start when even good inits aren't enough - good starts
# PLUS a small perturbed ensemble is often the most robust setup.
fit <- ferx_fit(ex$model, ex$data,
  method         = "gn",
  inits_from_nca = "nca_sweep",
  settings       = list(n_starts = 4L)
)

# -- Deep Compartment Model (covariate neural network) -----------------------
# Requires ferx-r built with the `nn` cargo feature.
# The [covariate_nn TYPICAL_PK] block replaces analytical covariate functions
# with a small MLP: WT + CRCL -> multiplicative modulator on TVCL/TVV1/etc.
# ferx_fit() call is identical to any other model.
dcm <- ferx_example("warfarin_dcm")
fit_dcm <- ferx_fit(dcm$model, dcm$data, method = "focei")

# NN metadata is in fit$neural_networks (one sub-list per [covariate_nn] block)
fit_dcm$neural_networks[[1]]$name          # "TYPICAL_PK"
fit_dcm$neural_networks[[1]]$shape         # e.g. c(2, 8, 8, 5)
fit_dcm$neural_networks[[1]]$n_weights     # total weight + bias count
fit_dcm$neural_networks[[1]]$input_names   # c("WT", "CRCL")
fit_dcm$neural_networks[[1]]$output_names  # c("CL", "V1", "Q", "V2", "KA")

# NN weight thetas are suppressed from the THETA table in print();
# a NEURAL NETWORKS summary block is shown instead.
print(fit_dcm)

# See inst/examples/ex_warfarin_dcm.R for a full worked example including
# interpretability heuristics and comparison against a no-covariate baseline.