Error Model

Maturity: stable — see Feature Maturity for what this means.

The [error_model] block defines the residual error structure, specifying how observed data (DV) relates to model predictions.

Syntax

DV ~ ERROR_TYPE(SIGMA_PARAMS)

Available Error Models

Additive

DV ~ additive(SIGMA_NAME)

The residual variance is constant across all predictions:

\[ \text{Var}(DV) = \sigma^2 \]

Use when measurement error is independent of concentration (e.g., assay with fixed precision).

Proportional

DV ~ proportional(SIGMA_NAME)

The residual variance scales with the predicted value:

\[ \text{Var}(DV) = (\sigma \cdot f)^2 \]

where $f$ is the model prediction. Use when measurement error increases with concentration (most common in PK).

Combined

DV ~ combined(SIGMA_PROP, SIGMA_ADD)

Combines proportional and additive components:

\[ \text{Var}(DV) = (\sigma_1 \cdot f)^2 + \sigma_2^2 \]

Use when both proportional and additive error sources are present. Requires two sigma parameters defined in [parameters].

Time-varying / covariate-dependent magnitude

Maturity: experimental (#484).

Any sigma argument may be written as an expression instead of a bare parameter name, letting the residual-error magnitude depend on TIME, covariates, and thetas. This reproduces the NONMEM $ERROR idiom of a time- or covariate-dependent error coefficient, e.g.

PROP = THETA(2)
IF(TIME.GT.24) PROP = THETA(2) * THETA(3)
W = SQRT(THETA(3)**2 + PROP**2 * IPRED**2)

which in ferx becomes:

[parameters]
  sigma PROP_ERR ~ 0.1  (sd)
  sigma ADD_ERR  ~ 22.0 (sd)
  theta RUV_LATE(1.5, 0.0, 10.0)   # late-phase RUV inflation

[error_model]
  DV ~ combined(PROP_ERR * (if (TIME > 24.0) RUV_LATE else 1.0), ADD_ERR)

The expression is a per-observation multiplier on that sigma’s loading, so the variance contribution becomes

\[ \text{Var}_i(DV) = \sum_k \big(\text{loading}_{k,i} \cdot m_k(\text{TIME}_i, \text{cov}_i, \theta) \cdot \sigma_k\big)^2 \]

A bare sigma name keeps the legacy behaviour exactly ($m_k \equiv 1$).

Rules and restrictions (current).

An expression argument must reference exactly one declared sigma — the one whose magnitude it scales.
The magnitude may depend only on TIME, declared covariates, thetas, and the scaled sigma. It may not depend on a random effect (η), on an individual parameter, or on the prediction (IPRED/PRED/DV) — the multiplier must be independent of the inner empirical-Bayes loop. TAD/TAFD are not yet available.
The TIME fed to the expression is the raw data-file time (matching NONMEM’s $ERROR).
Supported for method = foce and method = focei only. SAEM, GN, GN-hybrid, and importance-sampling are rejected up front.
Analytic gradient (#576/#486). The inner EBE η-gradient and the outer θ/σ population gradient are both analytic for a magnitude-active model under FOCE and FOCEI — the magnitude is η-independent, so the inner loop just threads its per-observation multiplier into the variance/its f-derivative; the outer loop differentiates the magnitude program itself (a new direct-θ term in ∂R/∂θ, since mult(θ) enters the residual variance independent of the prediction). For plain (non-interaction) FOCE the same magnitude enters the Sheiner–Beal marginal through its typical-value residual variance R⁰ (value and direct-θ derivative), on both the non-IOV and IOV paths. A few combinations still fall back to finite differences (which remain magnitude-aware, so fits stay correct, just slower): block_sigma correlated residual error, iiv_on_ruv, an M3-BLOQ censored row, and more than 32 thetas.

Log-transform-both-sides (LTBS)

For data whose residual error is multiplicative (a constant CV across the concentration range), an alternative to the proportional model is to fit on the log scale: both the observation and the prediction are log-transformed and an additive error is applied on the log scale. This matches NONMEM’s Y = LOG(F) + EPS(1) convention and is the natural choice when importing a model or dataset from NONMEM.

There are two ways to declare it, depending on the scale of the DV column in your dataset:

`log(DV) ~ additive(SIGMA)` — DV on the natural scale

[parameters]
  sigma ADD_LOG ~ 0.1     # additive SD on the LOG scale

[error_model]
  log(DV) ~ additive(ADD_LOG)

The engine log-transforms the DV column itself (once, at load), and compares it to log(prediction). Use this when your data holds concentrations on the natural scale.

`DV ~ log_additive(SIGMA)` — DV already log-transformed

[error_model]
  DV ~ log_additive(ADD_LOG)

Use this when the DV column is already log-transformed in the dataset (e.g. exported from a NONMEM workflow that pre-logged the data). The engine takes DV as-is and log-transforms only the prediction. log_additive is additive error on the log scale.

Output scale

Under LTBS, everything is reported on the log scale, matching NONMEM: IPRED/PRED, IWRES/CWRES, and simulated DV are all on the log scale. Back-transform with exp() if you need natural-scale values.

The likelihood term is the additive form on the log scale:

\[ \text{Var}(\log DV) = \sigma^2, \qquad \text{IWRES} = \frac{\log DV - \log f}{\sigma} \]

Restrictions

Additive only. LTBS pairs with additive error on the log scale; log(DV) ~ proportional(...) / combined(...) are rejected at parse time.
Single endpoint. Not supported with per-CMT (multi-endpoint) error models.
No SDE. Not supported with a [diffusion] (SDE/EKF) model.
BLOQ/M3 is supported — the LLOQ is log-transformed alongside DV.
A non-positive DV under log(DV) ~ additive(...) cannot be log-transformed; it is floored to log(1e-12) and a warning is emitted (check your data scale, or use DV ~ log_additive(...) if the data is already log-transformed).

Multiple endpoints (per-CMT error models)

For simultaneous PK/PD (and other multi-analyte) models, a single observed compartment is not enough: plasma concentrations and a PD effect typically need different residual error models in the same joint likelihood. Prefix each error line with CMT=N: to assign a distinct error model to each observed compartment, dispatched by the dataset’s CMT column:

[error_model]
  CMT=2: DV ~ proportional(PROP_ERR_PK)   # plasma concentration (central)
  CMT=3: DV ~ additive(ADD_ERR_PD)        # PD effect (effect compartment)

Every observation row is matched to the endpoint whose CMT=N equals its CMT value, and its residual variance is computed from that endpoint’s error model and sigma(s). All endpoints contribute to one FOCEI objective, so PK and PD parameters — and their uncertainty — are estimated jointly. This is the gold-standard alternative to the sequential workaround (fit PK, freeze IPRED, then fit PD), which underestimates uncertainty.

Each endpoint’s sigma parameters are declared once in [parameters], as usual:

[parameters]
  sigma PROP_ERR_PK ~ 0.10 (sd)
  sigma ADD_ERR_PD  ~ 1.00 (sd)

Rules and restrictions:

ODE models only. Per-CMT dispatch lives in the finite-difference likelihood path used by ODE models. An analytical PK model with CMT=N: lines is rejected at parse time.
No mixing styles. An [error_model] block is either a single plain DV ~ ... line or all CMT=N: lines — not both.
Coverage is checked at fit time. Every observed CMT in the dataset must have a matching CMT=N: entry, or fit() errors and names the missing compartments. Duplicate CMT=N entries are rejected at parse time.
Estimation method. Supported with FOCE/FOCEI, the Gauss-Newton optimizers, and SAEM (optionally followed by imp).

A complete worked model lives in examples/emax_pkpd.ferx — an oral 1-compartment PK model with an effect-compartment Emax PD readout, proportional error on the plasma endpoint and additive error on the PD endpoint. The per-CMT readout (which compartment/expression each CMT maps to) is configured in the [scaling] block.

Covariate-selected error models (`if/else`)

Per-CMT dispatch keys on the dataset’s CMT column. When two endpoints share the same compartment/readout but need different residual error — a classic example is a free-vs-total assay, where both measurements are the same concentration readout but carry different proportional error — select the error model with an arbitrary covariate if/else instead, exactly as the [scaling] block selects a readout (Form C):

[error_model]
  if (FREE == 0) {
    DV ~ proportional(PROP_ERR_TOTAL)
  } else {
    DV ~ proportional(PROP_ERR_UNBOUND)
  }

Each observation’s residual error model is chosen by evaluating the conditions top-to-bottom against that row’s covariate snapshot and taking the first match; the mandatory final else catches every remaining row. else if chains any number of branches:

[error_model]
  if (ASSAY == 1) {
    DV ~ proportional(PROP_LCMS)
  } else if (ASSAY == 2) {
    DV ~ combined(PROP_ELISA, ADD_ELISA)
  } else {
    DV ~ additive(ADD_OTHER)
  }

This mirrors an analytic Form C readout that switches on the same per-row flag, so a model can express both the readout and its residual error against one covariate without recoding the flag into a synthetic CMT column:

[scaling]
  y = if (FREE == 0) central/V1 + BMAX*(central/V1)/(KD + central/V1) else central/V1

[error_model]
  if (FREE == 0) { DV ~ proportional(PROP_ERR_TOTAL) }
  else           { DV ~ proportional(PROP_ERR_UNBOUND) }

Rules and restrictions:

Analytical and ODE models. Unlike per-CMT error (ODE-only), covariate selection works on closed-form analytical PK models too — the selection is a per-observation covariate constant, so it flows through the same analytic FOCE/FOCEI σ-gradient channel once the endpoint is chosen.
A final else is required so every observation maps to a declared error model (no silent fall-through).
The selector covariate is a required data column. A covariate named in a condition must be present in the data, or fit() fails with E_MISSING_COVARIATE (it is never silently read as 0). Declaring it in [covariates] is recommended.
Estimation method. Supported with FOCE/FOCEI, the Gauss-Newton optimizers, SAEM, and importance sampling — every likelihood/gradient path dispatches on the selected endpoint.
block_sigma correlated residuals are supported together with a covariate-selected error model (#669). Each observation resolves to a branch per row, loads that branch’s sigmas, and rows sharing a subject time and occasion pick up the block_sigma cross covariance in the dense residual R — exactly the pairing rule used for per-CMT endpoints. Two co-temporal rows resolving to different branches (e.g. a total FREE=0 and an unbound FREE=1 measurement of the same sample) get the honest cross-branch covariance ρ·σ_i·σ_j from each row’s own loadings. The method restrictions on block_sigma (see the Sigma scale section) apply unchanged.
LTBS (log(DV) ~ ... / log_additive) is not supported inside a branch.
SDE / [diffusion] models are not supported. The EKF measurement-noise path uses a single error model and cannot switch per observation, so the combination is rejected at parse time. Use a single-endpoint error model.
Adaptive dosing (simulate_adaptive*) is not supported: the assay keys residual error by the monitored compartment, whereas a selected error model keys endpoints by covariate branch — the combination is rejected.
simulate() validates the selector covariate the same way fit() does: a selector covariate absent from the data is a hard E_MISSING_COVARIATE error, never silently read as 0.

Comparison with NONMEM

The equivalent NONMEM $ERROR block selects the residual error with IF:

IF (FREE.EQ.0) THEN
   Y = IPRED*(1 + EPS(1))     ; total: PROP_ERR_TOTAL = SD of EPS(1)
ELSE
   Y = IPRED*(1 + EPS(2))     ; unbound: PROP_ERR_UNBOUND = SD of EPS(2)
ENDIF

ferx’s if (FREE == 0) { DV ~ proportional(PROP_ERR_TOTAL) } else { DV ~ proportional(PROP_ERR_UNBOUND) } produces the same per-row residual variance (IPRED·σ)² with σ selected by FREE, and the same joint FOCEI objective — no synthetic CMT recoding required on either side.

Sigma scale

All sigma parameters are estimated and reported on the standard-deviation scale, not the variance scale. This is true for both proportional and additive components, and for both elements of a combined error model.

In particular:

Error model	Initial value `sigma = X` means …
`proportional`	The residual SD scales as `X · f`, i.e. CV% = `X · 100`
`additive`	The residual SD is `X` in the units of `DV` (no CV interpretation)
`combined`	First sigma is proportional (CV-style), second is additive (units of `DV`)

So sigma PROP_ERR ~ 0.1 for a proportional model is a 10% CV initial value, not 1%. Likewise, the SE on a proportional sigma is on the SD scale — multiply by 100 for an SE in CV-percentage points.

The fitted YAML emits both the SD (estimate) and the variance (variance: estimate²); for proportional components it also emits cv_pct = estimate · 100 so downstream tooling does not have to re-derive it.

Examples

Proportional error (most common):

[parameters]
  sigma PROP_ERR ~ 0.01

[error_model]
  DV ~ proportional(PROP_ERR)

Additive error:

[parameters]
  sigma ADD_ERR ~ 1.0

[error_model]
  DV ~ additive(ADD_ERR)

Combined error:

[parameters]
  sigma PROP_ERR ~ 0.1
  sigma ADD_ERR  ~ 0.5

[error_model]
  DV ~ combined(PROP_ERR, ADD_ERR)

Correlated combined error:

[parameters]
  block_sigma (PROP_ERR, ADD_ERR) = [0.04, 0.10, 1.00] FIX

[error_model]
  DV ~ combined(PROP_ERR, ADD_ERR)

block_sigma uses NONMEM-style lower-triangle covariance values: [Var(PROP_ERR), Cov(PROP_ERR, ADD_ERR), Var(ADD_ERR)]. Diagonal entries initialize the named sigma SDs (sqrt(variance)) and the off-diagonal entries define fixed residual correlations. For the example above, PROP_ERR = 0.2, ADD_ERR = 1.0, and Corr(PROP_ERR, ADD_ERR) = 0.5, so the residual variance for a combined-error observation is:

\[ V = (f \sigma_\mathrm{prop})^2 + \sigma_\mathrm{add}^2 + 2 f \rho \sigma_\mathrm{prop}\sigma_\mathrm{add} \]

The diagonal sigma SDs are estimated unless marked FIX; the off-diagonal covariance is always converted to a fixed correlation. A nonzero block_sigma covariance can connect sigmas that co-load on the same combined(...) endpoint, or sigmas used by different endpoints measured at the same subject time and occasion — whether those endpoints are selected by the CMT column (per-CMT) or by a covariate if/else selector (#669). In the cross-endpoint case ferx builds a subject-level residual covariance matrix, so paired rows such as total/unbound assays receive the NONMEM-style covariance term f_total * f_unbound * Cov(EPS_total, EPS_unbound).

Current scope: block_sigma is supported for the Gaussian method = foce, method = focei, method = saem, and method = imp fits. The Gauss-Newton optimizers (gn / gn_hybrid), M3 censored-observation likelihoods, FREM covariate pseudo-observations, iiv_on_ruv, TTE endpoints, and IOV reject correlated residual sigma blocks until their residual-error derivative code is extended beyond diagonal RUV. SDE/EKF process-noise models carry the correlation under method = foce / method = focei (the EKF variance is added to the dense R) but are rejected under method = saem, whose M-step data term does not yet include the process-noise inflation.

simulate() draws each observation’s residual independently and does not reproduce a block_sigma off-diagonal covariance — simulated total/unbound (or cross-branch, for a covariate-selected model) rows are uncorrelated even though the estimation objective honours the correlation. A VPC or posterior-predictive check of the correlated endpoints will therefore understate the residual covariance; this applies to every block_sigma model, per-CMT and covariate-selected alike (tracked in issue #672).

Under method = focei the off-diagonal covariance enters the interaction correction as well as the data term: the Almquist 2015 conditional Hessian becomes H̃ = HᵀR⁻¹H + ½·B + Ω⁻¹ with B_{kl} = tr(R⁻¹ ∂R/∂η_k R⁻¹ ∂R/∂η_l) over the full dense residual covariance R (it reduces to the diagonal FOCEI a'diag(1/R)a + ½c̃'c̃ when the correlation is zero). Under method = imp the importance weights already use the dense R; the Student-t proposal precision is also built from R⁻¹ so the correlated posterior shape is sampled efficiently.

Both loops of a foce / focei correlated fit run exact analytic gradients (no finite differences): the within-observation cross term is carried through the same dense-R derivative kernels the objective uses (compute_dr_df_matrices, compute_d2r_df2_matrices), so the inner EBE η-gradient and the outer θ/Ω/σ gradient are noise-free and the auto optimizer resolves to a gradient-based method. The OFV is unchanged (Eval 1 on the anchor below is still 18.722087). In the analytic scope (analytical 1-/2-/3-cpt, single endpoint) the correlation only couples the σ-loadings within one observation, so R stays diagonal and the dense assembly reduces to the scalar contractions fed correlation-aware (R, ∂R/∂f, ∂²R/∂f²); a genuine cross-endpoint off-diagonal R (which needs a per-CMT / Form-C readout) falls back to per-subject finite differences.

Validation: examples/correlated_residual_combined.ferx and nonmem_anchor/correlated_residual_combined.ctl run the same fixed-parameter one-compartment IV model with $SIGMA BLOCK(2) FIX. On data/correlated_residual_combined.csv, NONMEM 7.5.1 (METHOD=1 MAXEVAL=0) reports OFV 18.715849 and ferx FOCE reports OFV 18.727365 (difference 0.0115). Adding INTER to that control file (METHOD=1 INTER MAXEVAL=0) reports OFV 18.722087, and ferx method = focei reports OFV 18.722087 — agreement to better than 1e-5, since the FOCEI Laplace marginal evaluates R at the conditional mode η̂ exactly as NONMEM’s INTER does (no Sheiner–Beal linearization point to disagree on).

The SAEM path reuses the same dense residual-covariance likelihood (dense_residual_data_term) for its E-step Metropolis acceptance and its theta/sigma M-step objective. Its reported OFV (the FOCE-approximation 2·pop_nll used for AIC/BIC comparability) now follows the interaction flag exactly like FOCE/FOCEI: with interaction on (the default) it reports the dense interaction marginal — ferx SAEM OFV 18.7221, matching ferx FOCEI and NONMEM METHOD=1 INTER — and with interaction = false it reports the non-interaction marginal (18.7274, matching ferx FOCE). Previously SAEM always fell back to the non-interaction marginal for correlated models regardless of the flag; the correlation itself was, and is, carried in both cases. As a marginal-likelihood sanity check, NONMEM’s importance-sampling −2LL at these reference parameters (METHOD=IMP EONLY=1, 30 000 samples) is 18.68; dropping the off-diagonal covariance (a diagonal $SIGMA) lowers all engines to ≈14.62, so the ≈4-OFV gap is the cross-endpoint correlation being correctly accounted for.

IIV on residual error (`iiv_on_ruv`)

NONMEM models often place an inter-individual random scaling on the residual error — each subject gets a log-normally scaled residual SD:

Y = IPRED + EPS(1) * EXP(ETA(4))     ; $OMEGA <var>   ; IIV on RUV

ferx supports this with an iiv_on_ruv line in [error_model]. Declare the random effect as an ordinary omega (it is not referenced by any individual parameter), then name it:

[parameters]
  ...
  omega ETA_RUV ~ 0.05      # IIV on residual error

  sigma PROP_ERR ~ 0.1 (sd)

[error_model]
  DV ~ proportional(PROP_ERR)
  iiv_on_ruv = ETA_RUV

The per-subject residual variance of every observation is multiplied by exp(2 · ETA_RUV_i) — equivalently, the residual SD is scaled by exp(ETA_RUV_i). This applies uniformly to additive, proportional, and combined error (it scales the whole variance), matching EPS * EXP(ETA) for the common single-EPS case. The extra random effect is reported like any other omega (ETA_RUV appears in the omega matrix and eta names).

Notes and constraints:

Requires an interaction or Monte-Carlo method: use method = focei, imp, impmap, or saem. Y = IPRED + EPS*EXP(ETA) makes the residual variance η-dependent, so non-interaction FOCE cannot represent it — ferx rejects that combination with a clear error.
The named eta must be a dedicated omega not shared with a structural individual parameter.
Not supported with per-CMT (multi-endpoint) error models.
Exact analytic FOCE/FOCEI gradient. For analytical 1-/2-/3-cpt, ODE ([odes]), and LTBS (log_additive) models, iiv_on_ruv fits use the exact outer θ/Ω/σ gradient — the residual-eta enters through the variance scale exp(2·ETA_RUV), contributing the Almquist interaction column to H̃ and the matching ∂NLL/∂η_ruv = Σ(1−ε²/v) data term. The inner EBE gradient is analytic too except under LTBS, where it keeps finite differences (the established LTBS choice). IOV and M3-BLOQ iiv_on_ruv models keep the finite-difference gradient.

Identifiability. A residual-error random effect is a variance-of-variance term and is only weakly identified when the data carry little per-subject residual-scale signal (few observations per subject, or a small true variance). In that regime the marginal correctly shrinks the estimate toward zero. Strong or well-sampled signals are recovered well. When the FOCEI estimate looks collapsed, prefer the Monte-Carlo estimators (imp / impmap / saem), which do not rely on the Laplace approximation.

See examples/iiv_on_ruv.ferx.

Impact on Estimation

The error model affects: - Individual weighted residuals (IWRES): (DV - IPRED) / sqrt(Var) — with iiv_on_ruv, Var includes the per-subject exp(2·ETA_RUV) scale at the EBE. - Conditional weighted residuals (CWRES): Accounts for uncertainty in random effect estimates - Objective function value (OFV): The likelihood includes log(Var) terms, so the error model structure directly influences parameter estimates