Error Model
Maturity: stable — see Feature Maturity for what this means.
The [error_model] block defines the residual error structure, specifying how observed data (DV) relates to model predictions.
Syntax
DV ~ ERROR_TYPE(SIGMA_PARAMS)
Available Error Models
Additive
DV ~ additive(SIGMA_NAME)
The residual variance is constant across all predictions:
\[ \text{Var}(DV) = \sigma^2 \]
Use when measurement error is independent of concentration (e.g., assay with fixed precision).
Proportional
DV ~ proportional(SIGMA_NAME)
The residual variance scales with the predicted value:
\[ \text{Var}(DV) = (\sigma \cdot f)^2 \]
where \(f\) is the model prediction. Use when measurement error increases with concentration (most common in PK).
Combined
DV ~ combined(SIGMA_PROP, SIGMA_ADD)
Combines proportional and additive components:
\[ \text{Var}(DV) = (\sigma_1 \cdot f)^2 + \sigma_2^2 \]
Use when both proportional and additive error sources are present. Requires two sigma parameters defined in [parameters].
Time-varying / covariate-dependent magnitude
Maturity: experimental (#484).
Any sigma argument may be written as an expression instead of a bare parameter name, letting the residual-error magnitude depend on TIME, covariates, and thetas. This reproduces the NONMEM $ERROR idiom of a time- or covariate-dependent error coefficient, e.g.
PROP = THETA(2)
IF(TIME.GT.24) PROP = THETA(2) * THETA(3)
W = SQRT(THETA(3)**2 + PROP**2 * IPRED**2)
which in ferx becomes:
[parameters]
sigma PROP_ERR ~ 0.1 (sd)
sigma ADD_ERR ~ 22.0 (sd)
theta RUV_LATE(1.5, 0.0, 10.0) # late-phase RUV inflation
[error_model]
DV ~ combined(PROP_ERR * (if (TIME > 24.0) RUV_LATE else 1.0), ADD_ERR)
The expression is a per-observation multiplier on that sigma’s loading, so the variance contribution becomes
\[ \text{Var}_i(DV) = \sum_k \big(\text{loading}_{k,i} \cdot m_k(\text{TIME}_i, \text{cov}_i, \theta) \cdot \sigma_k\big)^2 \]
A bare sigma name keeps the legacy behaviour exactly (\(m_k \equiv 1\)).
Rules and restrictions (current).
- An expression argument must reference exactly one declared sigma — the one whose magnitude it scales.
- The magnitude may depend only on
TIME, declared covariates, thetas, and the scaled sigma. It may not depend on a random effect (η), on an individual parameter, or on the prediction (IPRED/PRED/DV) — the multiplier must be independent of the inner empirical-Bayes loop.TAD/TAFDare not yet available. - The
TIMEfed to the expression is the raw data-file time (matching NONMEM’s$ERROR). - Supported for
method = foceandmethod = foceionly. SAEM, GN, GN-hybrid, and importance-sampling are rejected up front. - Analytic gradient (#576/#486). The inner EBE η-gradient and the outer θ/σ population gradient are both analytic for a magnitude-active model under FOCE and FOCEI — the magnitude is η-independent, so the inner loop just threads its per-observation multiplier into the variance/its
f-derivative; the outer loop differentiates the magnitude program itself (a new direct-θ term in∂R/∂θ, sincemult(θ)enters the residual variance independent of the prediction). For plain (non-interaction) FOCE the same magnitude enters the Sheiner–Beal marginal through its typical-value residual varianceR⁰(value and direct-θ derivative), on both the non-IOV and IOV paths. A few combinations still fall back to finite differences (which remain magnitude-aware, so fits stay correct, just slower):block_sigmacorrelated residual error,iiv_on_ruv, an M3-BLOQ censored row, and more than 32 thetas.
Log-transform-both-sides (LTBS)
For data whose residual error is multiplicative (a constant CV across the concentration range), an alternative to the proportional model is to fit on the log scale: both the observation and the prediction are log-transformed and an additive error is applied on the log scale. This matches NONMEM’s Y = LOG(F) + EPS(1) convention and is the natural choice when importing a model or dataset from NONMEM.
There are two ways to declare it, depending on the scale of the DV column in your dataset:
log(DV) ~ additive(SIGMA) — DV on the natural scale
[parameters]
sigma ADD_LOG ~ 0.1 # additive SD on the LOG scale
[error_model]
log(DV) ~ additive(ADD_LOG)
The engine log-transforms the DV column itself (once, at load), and compares it to log(prediction). Use this when your data holds concentrations on the natural scale.
DV ~ log_additive(SIGMA) — DV already log-transformed
[error_model]
DV ~ log_additive(ADD_LOG)
Use this when the DV column is already log-transformed in the dataset (e.g. exported from a NONMEM workflow that pre-logged the data). The engine takes DV as-is and log-transforms only the prediction. log_additive is additive error on the log scale.
Output scale
Under LTBS, everything is reported on the log scale, matching NONMEM: IPRED/PRED, IWRES/CWRES, and simulated DV are all on the log scale. Back-transform with exp() if you need natural-scale values.
The likelihood term is the additive form on the log scale:
\[ \text{Var}(\log DV) = \sigma^2, \qquad \text{IWRES} = \frac{\log DV - \log f}{\sigma} \]
Restrictions
- Additive only. LTBS pairs with additive error on the log scale;
log(DV) ~ proportional(...)/combined(...)are rejected at parse time. - Single endpoint. Not supported with per-CMT (multi-endpoint) error models.
- No SDE. Not supported with a
[diffusion](SDE/EKF) model. - BLOQ/M3 is supported — the LLOQ is log-transformed alongside
DV. - A non-positive
DVunderlog(DV) ~ additive(...)cannot be log-transformed; it is floored tolog(1e-12)and a warning is emitted (check your data scale, or useDV ~ log_additive(...)if the data is already log-transformed).
Multiple endpoints (per-CMT error models)
For simultaneous PK/PD (and other multi-analyte) models, a single observed compartment is not enough: plasma concentrations and a PD effect typically need different residual error models in the same joint likelihood. Prefix each error line with CMT=N: to assign a distinct error model to each observed compartment, dispatched by the dataset’s CMT column:
[error_model]
CMT=2: DV ~ proportional(PROP_ERR_PK) # plasma concentration (central)
CMT=3: DV ~ additive(ADD_ERR_PD) # PD effect (effect compartment)
Every observation row is matched to the endpoint whose CMT=N equals its CMT value, and its residual variance is computed from that endpoint’s error model and sigma(s). All endpoints contribute to one FOCEI objective, so PK and PD parameters — and their uncertainty — are estimated jointly. This is the gold-standard alternative to the sequential workaround (fit PK, freeze IPRED, then fit PD), which underestimates uncertainty.
Each endpoint’s sigma parameters are declared once in [parameters], as usual:
[parameters]
sigma PROP_ERR_PK ~ 0.10 (sd)
sigma ADD_ERR_PD ~ 1.00 (sd)
Rules and restrictions:
- ODE models only. Per-CMT dispatch lives in the finite-difference likelihood path used by ODE models. An analytical PK model with
CMT=N:lines is rejected at parse time. - No mixing styles. An
[error_model]block is either a single plainDV ~ ...line or allCMT=N:lines — not both. - Coverage is checked at fit time. Every observed
CMTin the dataset must have a matchingCMT=N:entry, orfit()errors and names the missing compartments. DuplicateCMT=Nentries are rejected at parse time. - Estimation method. Supported with FOCE/FOCEI, the Gauss-Newton optimizers, and SAEM (optionally followed by
imp).
A complete worked model lives in examples/emax_pkpd.ferx — an oral 1-compartment PK model with an effect-compartment Emax PD readout, proportional error on the plasma endpoint and additive error on the PD endpoint. The per-CMT readout (which compartment/expression each CMT maps to) is configured in the [scaling] block.
Covariate-selected error models (if/else)
Per-CMT dispatch keys on the dataset’s CMT column. When two endpoints share the same compartment/readout but need different residual error — a classic example is a free-vs-total assay, where both measurements are the same concentration readout but carry different proportional error — select the error model with an arbitrary covariate if/else instead, exactly as the [scaling] block selects a readout (Form C):
[error_model]
if (FREE == 0) {
DV ~ proportional(PROP_ERR_TOTAL)
} else {
DV ~ proportional(PROP_ERR_UNBOUND)
}
Each observation’s residual error model is chosen by evaluating the conditions top-to-bottom against that row’s covariate snapshot and taking the first match; the mandatory final else catches every remaining row. else if chains any number of branches:
[error_model]
if (ASSAY == 1) {
DV ~ proportional(PROP_LCMS)
} else if (ASSAY == 2) {
DV ~ combined(PROP_ELISA, ADD_ELISA)
} else {
DV ~ additive(ADD_OTHER)
}
This mirrors an analytic Form C readout that switches on the same per-row flag, so a model can express both the readout and its residual error against one covariate without recoding the flag into a synthetic CMT column:
[scaling]
y = if (FREE == 0) central/V1 + BMAX*(central/V1)/(KD + central/V1) else central/V1
[error_model]
if (FREE == 0) { DV ~ proportional(PROP_ERR_TOTAL) }
else { DV ~ proportional(PROP_ERR_UNBOUND) }
Rules and restrictions:
- Analytical and ODE models. Unlike per-CMT error (ODE-only), covariate selection works on closed-form analytical PK models too — the selection is a per-observation covariate constant, so it flows through the same analytic FOCE/FOCEI σ-gradient channel once the endpoint is chosen.
- A final
elseis required so every observation maps to a declared error model (no silent fall-through). - The selector covariate is a required data column. A covariate named in a condition must be present in the data, or
fit()fails withE_MISSING_COVARIATE(it is never silently read as 0). Declaring it in[covariates]is recommended. - Estimation method. Supported with FOCE/FOCEI, the Gauss-Newton optimizers, SAEM, and importance sampling — every likelihood/gradient path dispatches on the selected endpoint.
block_sigmacorrelated residuals are supported together with a covariate-selected error model (#669). Each observation resolves to a branch per row, loads that branch’s sigmas, and rows sharing a subject time and occasion pick up theblock_sigmacross covariance in the dense residualR— exactly the pairing rule used for per-CMT endpoints. Two co-temporal rows resolving to different branches (e.g. a totalFREE=0and an unboundFREE=1measurement of the same sample) get the honest cross-branch covarianceρ·σ_i·σ_jfrom each row’s own loadings. The method restrictions onblock_sigma(see the Sigma scale section) apply unchanged.- LTBS (
log(DV) ~ .../log_additive) is not supported inside a branch. - SDE /
[diffusion]models are not supported. The EKF measurement-noise path uses a single error model and cannot switch per observation, so the combination is rejected at parse time. Use a single-endpoint error model. - Adaptive dosing (
simulate_adaptive*) is not supported: the assay keys residual error by the monitored compartment, whereas a selected error model keys endpoints by covariate branch — the combination is rejected. simulate()validates the selector covariate the same wayfit()does: a selector covariate absent from the data is a hardE_MISSING_COVARIATEerror, never silently read as 0.
Comparison with NONMEM
The equivalent NONMEM $ERROR block selects the residual error with IF:
IF (FREE.EQ.0) THEN
Y = IPRED*(1 + EPS(1)) ; total: PROP_ERR_TOTAL = SD of EPS(1)
ELSE
Y = IPRED*(1 + EPS(2)) ; unbound: PROP_ERR_UNBOUND = SD of EPS(2)
ENDIFferx’s if (FREE == 0) { DV ~ proportional(PROP_ERR_TOTAL) } else { DV ~ proportional(PROP_ERR_UNBOUND) } produces the same per-row residual variance (IPRED·σ)² with σ selected by FREE, and the same joint FOCEI objective — no synthetic CMT recoding required on either side.
Sigma scale
All sigma parameters are estimated and reported on the standard-deviation scale, not the variance scale. This is true for both proportional and additive components, and for both elements of a combined error model.
In particular:
| Error model | Initial value sigma = X means … |
|---|---|
proportional |
The residual SD scales as X · f, i.e. CV% = X · 100 |
additive |
The residual SD is X in the units of DV (no CV interpretation) |
combined |
First sigma is proportional (CV-style), second is additive (units of DV) |
So sigma PROP_ERR ~ 0.1 for a proportional model is a 10% CV initial value, not 1%. Likewise, the SE on a proportional sigma is on the SD scale — multiply by 100 for an SE in CV-percentage points.
The fitted YAML emits both the SD (estimate) and the variance (variance: estimate²); for proportional components it also emits cv_pct = estimate · 100 so downstream tooling does not have to re-derive it.
Examples
Proportional error (most common):
[parameters]
sigma PROP_ERR ~ 0.01
[error_model]
DV ~ proportional(PROP_ERR)
Additive error:
[parameters]
sigma ADD_ERR ~ 1.0
[error_model]
DV ~ additive(ADD_ERR)
Combined error:
[parameters]
sigma PROP_ERR ~ 0.1
sigma ADD_ERR ~ 0.5
[error_model]
DV ~ combined(PROP_ERR, ADD_ERR)
Correlated combined error:
[parameters]
block_sigma (PROP_ERR, ADD_ERR) = [0.04, 0.10, 1.00] FIX
[error_model]
DV ~ combined(PROP_ERR, ADD_ERR)
block_sigma uses NONMEM-style lower-triangle covariance values: [Var(PROP_ERR), Cov(PROP_ERR, ADD_ERR), Var(ADD_ERR)]. Diagonal entries initialize the named sigma SDs (sqrt(variance)) and the off-diagonal entries define fixed residual correlations. For the example above, PROP_ERR = 0.2, ADD_ERR = 1.0, and Corr(PROP_ERR, ADD_ERR) = 0.5, so the residual variance for a combined-error observation is:
\[ V = (f \sigma_\mathrm{prop})^2 + \sigma_\mathrm{add}^2 + 2 f \rho \sigma_\mathrm{prop}\sigma_\mathrm{add} \]
The diagonal sigma SDs are estimated unless marked FIX; the off-diagonal covariance is always converted to a fixed correlation. A nonzero block_sigma covariance can connect sigmas that co-load on the same combined(...) endpoint, or sigmas used by different endpoints measured at the same subject time and occasion — whether those endpoints are selected by the CMT column (per-CMT) or by a covariate if/else selector (#669). In the cross-endpoint case ferx builds a subject-level residual covariance matrix, so paired rows such as total/unbound assays receive the NONMEM-style covariance term f_total * f_unbound * Cov(EPS_total, EPS_unbound).
Current scope: block_sigma is supported for the Gaussian method = foce, method = focei, method = saem, and method = imp fits. The Gauss-Newton optimizers (gn / gn_hybrid), M3 censored-observation likelihoods, FREM covariate pseudo-observations, iiv_on_ruv, TTE endpoints, and IOV reject correlated residual sigma blocks until their residual-error derivative code is extended beyond diagonal RUV. SDE/EKF process-noise models carry the correlation under method = foce / method = focei (the EKF variance is added to the dense R) but are rejected under method = saem, whose M-step data term does not yet include the process-noise inflation.
simulate() draws each observation’s residual independently and does not reproduce a block_sigma off-diagonal covariance — simulated total/unbound (or cross-branch, for a covariate-selected model) rows are uncorrelated even though the estimation objective honours the correlation. A VPC or posterior-predictive check of the correlated endpoints will therefore understate the residual covariance; this applies to every block_sigma model, per-CMT and covariate-selected alike (tracked in issue #672).
Under method = focei the off-diagonal covariance enters the interaction correction as well as the data term: the Almquist 2015 conditional Hessian becomes H̃ = HᵀR⁻¹H + ½·B + Ω⁻¹ with B_{kl} = tr(R⁻¹ ∂R/∂η_k R⁻¹ ∂R/∂η_l) over the full dense residual covariance R (it reduces to the diagonal FOCEI a'diag(1/R)a + ½c̃'c̃ when the correlation is zero). Under method = imp the importance weights already use the dense R; the Student-t proposal precision is also built from R⁻¹ so the correlated posterior shape is sampled efficiently.
Both loops of a foce / focei correlated fit run exact analytic gradients (no finite differences): the within-observation cross term is carried through the same dense-R derivative kernels the objective uses (compute_dr_df_matrices, compute_d2r_df2_matrices), so the inner EBE η-gradient and the outer θ/Ω/σ gradient are noise-free and the auto optimizer resolves to a gradient-based method. The OFV is unchanged (Eval 1 on the anchor below is still 18.722087). In the analytic scope (analytical 1-/2-/3-cpt, single endpoint) the correlation only couples the σ-loadings within one observation, so R stays diagonal and the dense assembly reduces to the scalar contractions fed correlation-aware (R, ∂R/∂f, ∂²R/∂f²); a genuine cross-endpoint off-diagonal R (which needs a per-CMT / Form-C readout) falls back to per-subject finite differences.
Validation: examples/correlated_residual_combined.ferx and nonmem_anchor/correlated_residual_combined.ctl run the same fixed-parameter one-compartment IV model with $SIGMA BLOCK(2) FIX. On data/correlated_residual_combined.csv, NONMEM 7.5.1 (METHOD=1 MAXEVAL=0) reports OFV 18.715849 and ferx FOCE reports OFV 18.727365 (difference 0.0115). Adding INTER to that control file (METHOD=1 INTER MAXEVAL=0) reports OFV 18.722087, and ferx method = focei reports OFV 18.722087 — agreement to better than 1e-5, since the FOCEI Laplace marginal evaluates R at the conditional mode η̂ exactly as NONMEM’s INTER does (no Sheiner–Beal linearization point to disagree on).
The SAEM path reuses the same dense residual-covariance likelihood (dense_residual_data_term) for its E-step Metropolis acceptance and its theta/sigma M-step objective. Its reported OFV (the FOCE-approximation 2·pop_nll used for AIC/BIC comparability) now follows the interaction flag exactly like FOCE/FOCEI: with interaction on (the default) it reports the dense interaction marginal — ferx SAEM OFV 18.7221, matching ferx FOCEI and NONMEM METHOD=1 INTER — and with interaction = false it reports the non-interaction marginal (18.7274, matching ferx FOCE). Previously SAEM always fell back to the non-interaction marginal for correlated models regardless of the flag; the correlation itself was, and is, carried in both cases. As a marginal-likelihood sanity check, NONMEM’s importance-sampling −2LL at these reference parameters (METHOD=IMP EONLY=1, 30 000 samples) is 18.68; dropping the off-diagonal covariance (a diagonal $SIGMA) lowers all engines to ≈14.62, so the ≈4-OFV gap is the cross-endpoint correlation being correctly accounted for.
IIV on residual error (iiv_on_ruv)
NONMEM models often place an inter-individual random scaling on the residual error — each subject gets a log-normally scaled residual SD:
Y = IPRED + EPS(1) * EXP(ETA(4)) ; $OMEGA <var> ; IIV on RUV
ferx supports this with an iiv_on_ruv line in [error_model]. Declare the random effect as an ordinary omega (it is not referenced by any individual parameter), then name it:
[parameters]
...
omega ETA_RUV ~ 0.05 # IIV on residual error
sigma PROP_ERR ~ 0.1 (sd)
[error_model]
DV ~ proportional(PROP_ERR)
iiv_on_ruv = ETA_RUV
The per-subject residual variance of every observation is multiplied by exp(2 · ETA_RUV_i) — equivalently, the residual SD is scaled by exp(ETA_RUV_i). This applies uniformly to additive, proportional, and combined error (it scales the whole variance), matching EPS * EXP(ETA) for the common single-EPS case. The extra random effect is reported like any other omega (ETA_RUV appears in the omega matrix and eta names).
Notes and constraints:
- Requires an interaction or Monte-Carlo method: use
method = focei,imp,impmap, orsaem.Y = IPRED + EPS*EXP(ETA)makes the residual variance η-dependent, so non-interaction FOCE cannot represent it — ferx rejects that combination with a clear error. - The named eta must be a dedicated
omeganot shared with a structural individual parameter. - Not supported with per-CMT (multi-endpoint) error models.
- Exact analytic FOCE/FOCEI gradient. For analytical 1-/2-/3-cpt, ODE (
[odes]), and LTBS (log_additive) models,iiv_on_ruvfits use the exact outer θ/Ω/σ gradient — the residual-eta enters through the variance scaleexp(2·ETA_RUV), contributing the Almquist interaction column toH̃and the matching∂NLL/∂η_ruv = Σ(1−ε²/v)data term. The inner EBE gradient is analytic too except under LTBS, where it keeps finite differences (the established LTBS choice). IOV and M3-BLOQiiv_on_ruvmodels keep the finite-difference gradient.
Identifiability. A residual-error random effect is a variance-of-variance term and is only weakly identified when the data carry little per-subject residual-scale signal (few observations per subject, or a small true variance). In that regime the marginal correctly shrinks the estimate toward zero. Strong or well-sampled signals are recovered well. When the FOCEI estimate looks collapsed, prefer the Monte-Carlo estimators (imp / impmap / saem), which do not rely on the Laplace approximation.
See examples/iiv_on_ruv.ferx.
Impact on Estimation
The error model affects: - Individual weighted residuals (IWRES): (DV - IPRED) / sqrt(Var) — with iiv_on_ruv, Var includes the per-subject exp(2·ETA_RUV) scale at the EBE. - Conditional weighted residuals (CWRES): Accounts for uncertainty in random effect estimates - Objective function value (OFV): The likelihood includes log(Var) terms, so the error model structure directly influences parameter estimates