Scaling

Maturity: beta — see Feature Maturity for what this means.

The optional [scaling] block declares how the structural model’s raw output maps to the observed DV. It exists so the user does not have to fold unit conversion or amount-to-concentration arithmetic into the structural model itself — keeping [odes] and [structural_model] readable, and making mixed-unit data (e.g. data in ng/mL when the model thinks in mg/L) straightforward.

The convention is divisive: pred_scaled = pred_raw / scale. This matches the natural reading of obs_scale = V/1000 as “divide amount by V/1000 to get concentration in the user’s units.”

Three forms are supported. Each is optional; omitting [scaling] keeps the historical “raw prediction equals DV” behaviour.

Form A — scalar divisor

Use for fixed unit conversion (e.g. mg/L → mg/mL is a constant 1000).

[scaling]
  obs_scale = 1000

Applies to analytical PK models and ODE models alike: every prediction is divided by the constant before reaching the residual error model.

Form B — expression divisor

Use when the scale depends on theta, eta, or a covariate.

[scaling]
  obs_scale = WT / 70

Expressions may reference:

thetas (e.g. TVV),
etas (e.g. ETA_CL),
covariates (e.g. WT, CR),
individual parameters declared in [individual_parameters] (e.g. V, CL) — these are resolved from a subject-static evaluation of pk_param_fn at scale-evaluation time, so obs_scale = 1000 / V uses the per-subject V (typical value times the EBE eta).

The scale is evaluated once per subject with subject-level covariates (matching the no-TV path). Time-varying covariate support for expression scales is a Phase 1.5 follow-up.

Form C — explicit output expression

Use when the built-in concentration readout is not the observed quantity — e.g. an ODE state is held as an amount and the observation is a concentration, or the observation is a nonlinear function of the concentration (a protein-binding correction, a free-vs-total switch, a parent/metabolite sum). Form C replaces the default readout entirely.

[structural_model]
  ode(states=[depot, central])     # no obs_cmt — Form C provides it

[odes]
  d/dt(depot)   = -KA * depot
  d/dt(central) = KA * depot - CL/V * central   # central holds amount

[scaling]
  y = central / V

The right-hand side may reference state names (depot, central), individual parameters (CL, V, KA), thetas (TVCL), etas (ETA_CL), and covariates (WT). All five name classes are looked up at evaluation time — states from the ODE solver, individual parameters from the subject-static pk_param_fn, and thetas/etas/covariates from the values supplied by the caller.

Form C on analytical PK models (since #650)

Form C is also accepted on analytical (closed-form) PK models. There the readout replaces the built-in concentration; it can reference the same name classes, with the compartment-amount scope tied to what the closed forms expose:

central — the central compartment amount (always available). The analytic engine computes a concentration internally; the readout sees the amount, i.e. central = concentration × V, so y = central / V recovers the concentration and any additive term layers on top.
depot — the oral depot amount, for first-order oral models (one_cpt_oral / two_cpt_oral / three_cpt_oral).
Peripheral compartments are not available (the closed forms don’t expose a cross-compartment amount) — referencing periph/peripheral… is a parse error pointing you to an ODE model, mirroring [initial_conditions].

Individual parameters that are not a structural PK role (e.g. a binding capacity BMAX or dissociation constant KD) may be referenced in an analytic readout and are differentiated exactly — they become first-class differentiable parameters (#650). if/else works, so a per-row covariate can switch the readout.

A free-vs-total fluconazole readout on an analytic 1-cpt model, with a saturable albumin-binding correction on the total-drug rows:

[individual_parameters]
  CL   = TVCL * exp(ETA_CL)
  V    = TVV  * exp(ETA_V)
  BMAX = TVBMAX
  KD   = TVKD

[structural_model]
  pk one_cpt_iv(cl=CL, v=V)

[scaling]
  y = if (FREE == 0) central / V + BMAX * (central / V) / (KD + central / V) \
      else           central / V

Form C replaces the built-in readout, so it cannot be combined with a divisive obs_scale on the same model (fold any unit conversion into y, e.g. y = central / V / 1000); the parser rejects the combination.

Covariates in a Form C expression are read from the per-observation covariate snapshot — the values on each observation’s own data row — so a time-varying covariate drives the readout at the time it is observed (since #535/#538). Because a Form C covariate is therefore an input the prediction depends on, it is a required data column: if the readout references a covariate that is absent from the dataset the fit fails with E_MISSING_COVARIATE rather than silently reading the missing value as 0.0. For time-constant covariates the readout is unchanged.

This matches NONMEM’s per-record $ERROR semantics. Validated against a free/total protein-binding model (NONMEM ADVAN3 TRANS4 whose $ERROR selects the total concentration when FREE==0 and the unbound concentration when FREE==1, with paired assay rows at the same time): evaluated at identical parameters, ferx’s per-record population predictions reproduce NONMEM’s PRED to ~1e-4 relative on both the total-assay and free-assay rows — the two rows at a given time differing only by that per-record covariate.

The analytical Form C readout (#650) inherits this NONMEM anchor: on an IV 1-cpt model the analytic central amount (concentration × V) equals the ODE central state amount, so a saturable-binding total-vs-free readout gives bit-for-bit-equal predictions on the analytic and ODE paths at matched parameters — the analytic-vs-ODE equivalence is checked by analytic_form_c_matches_ode_form_c_binding_readout.

Runtime behaviour on bad scales

If an expression scale (Form B or C) evaluates to a non-positive or non-finite value at runtime — for example WT / 70 when WT is missing (reads 0) or 1 / (TVV - x) near a singularity — every prediction for that subject is set to NaN. The outer NLL then evaluates to NaN and the optimizer rejects the step. This matches established NLM convention (NONMEM’s OBJFN = NaN → step rejection) and surfaces bad scales in the per-subject diagnostics rather than silently producing a mis-scaled fit.

Comparison with NONMEM and nlmixr2

Need	NONMEM	nlmixr2	ferx
Scalar unit conversion	`S1 = 1000`	(multiplier in `cmt`/`f`)	`[scaling] obs_scale = 1000`
Amount-state ODE with concentration DV	`S2 = V/1000` plus `Y = A(2)/S2`	`cmt(central); f = central/V/1000`	`[scaling] y = central / (V/1000)`

The ferx form is divisive by convention, so an obs_scale = V/1000 reads as “divide raw by V/1000” — matching NONMEM’s S2.

Interaction with gradients

All [scaling] variants on the analytical PK path support the default gradient = auto setting and forced finite differences (gradient = fd):

Form	`auto`	`fd`	Notes
Scalar `obs_scale = K`	exact analytic	exact FD	The constant threads as one entry per observation.
Expression `obs_scale = <expr>`	subject-static analytic	exact FD	See subject-static caveat below.
Per-CMT `obs_scale[CMT=N]`	subject-static analytic	exact FD	One per-observation scale entry per observation, dispatched by `subject.obs_cmts[i]`.
Form C `y[…] = <expr>` (ODE)	exact analytic (in-scope ODE)	exact FD	Since #410/#439 the ODE sensitivity provider differentiates the Form C readout — uniform `y = <expr>` and per-CMT `y[CMT=N] = <expr>` — over `Dual2`/`Dual1` (outer and inner), including covariate references in the readout (#540): a covariate threads in as a constant from the per-observation snapshot, so a free→total protein-binding readout gated on a `FREE` flag stays analytic. A θ or η referenced directly in the readout (rather than via an `[individual_parameters]` entry) is also analytic since #486 — the parser desugars each bare `THETA(i)`/`ETA(k)` into a hidden individual parameter, so the readout’s `∂y/∂θ`/`∂y/∂η` rides the individual-parameter sensitivity chain; only a readout referencing a neural-network output, or other out-of-scope ODE features (input-rate/SDE, unsupported steady-state combinations, …), still falls back to FD.
Form C `y = <expr>` (analytical)	exact analytic	exact FD	Since #650 the analytic sensitivity provider differentiates an analytical Form C readout over `Dual2`/`Dual1` (outer and inner), on the static dose-superposition path, the time-varying-covariate / oral-infusion event-walk path, and (since #655) IOV subjects (`kappa` declarations) — so a readout gated on a per-row covariate (a free-vs-total `FREE` flag) stays analytic, including under IOV. The readout’s central-compartment amount (`concentration × V`), the individual parameters it references — including non-structural ones like a binding `BMAX`/`KD`, which get a free differentiable PK slot — and covariates all flow through the exact η/θ chain; under IOV the readout parameters are BSV-only (a `kappa` reference is rejected at parse), so only the concentration carries the occasion κ. Falls back to FD (with a parse warning; the prediction stays exact) for: a readout referencing the oral depot amount, a per-CMT `y[CMT=N]`, a direct `THETA`/`ETA` or neural-network reference, a `[initial_conditions]` baseline, or (on the event-walk / IOV paths) a readout whose non-structural parameters overflow the eight structural sensitivity slots.

Analytic outer gradient handles expression scaling exactly. The analytic sensitivity provider that drives the gradient-based outer optimizers (bfgs, lbfgs, slsqp, …) on analytical 1-/2-/3-compartment models compiles an obs_scale expression to a Dual2-differentiable program and differentiates the scaled prediction f / scale exactly — including its η and θ dependence. So an η-dependent scale like 1000 / V is handled exactly by the analytic FOCE/FOCEI outer gradient with no gradient = fd needed.

Inner EBE loop also handles expression scaling analytically (since #486). The light inner-gradient provider now carries the η-only quotient rule ∂(f/scale)/∂η = (∂f/∂η)/scale − f·(∂scale/∂η)/scale² — the η-block of the outer’s quotient rule, evaluated once per subject over the same Dual2-differentiable scale program — so the per-subject inner EBE loop is exact for an η-dependent expression obs_scale like 1000 / V, not just the outer gradient. The ODE ([odes]) path also serves an η-dependent obs_scale analytically on both loops: on the static walk since #486, on the non-IOV time-varying-covariate walk since #486 (the divisor is subject-static, so one subject-static jet is applied post-walk), for IOV since #575, and for IOV subjects that also have time-varying covariates since #590. The closed-form (analytical 1-/2-/3-compartment) IOV path serves it analytically on both loops since #486 as well, including IOV subjects with time-varying covariates — the same per-occasion post-walk quotient the ODE IOV path uses. The IOV expression-scale quotient follows production semantics: the scale is evaluated once per occasion group with the subject-level covariate snapshot, while the event walk still uses per-event covariate snapshots for the dynamics. Per-CMT obs_scale[CMT=N] falls outside the analytic provider’s scope entirely, and ODE expression scaling combined with LTBS still routes to FD. Result-neutral — estimates and SEs are unchanged.

Interaction with SDE / `[diffusion]`

In Phase 1, [scaling] is not supported on SDE models. The EKF / Kalman update computes both the predicted mean and the prediction covariance p_obs in the observation space, and the per-observation r_obs callback evaluates the residual variance from that predicted mean. Forms A/B post-multiply only the mean, so the EKF variance would remain in the unscaled space — producing mis-scaled OFVs.

A correct SDE+scaling integration needs the scale factor threaded into both the EKF p_obs propagation (scales by 1/K²) and the residual variance callback. That’s a wider change deferred to Phase 1.5. Until then, the parser rejects any [scaling] block on a model with a [diffusion] block (Forms A, B, and C alike).

Multi-analyte / per-CMT scaling

For models that observe multiple compartments (parent + metabolite, sum-of-moieties, free vs. total, …), specify a separate scale per observed CMT using the obs_scale[CMT=N] (Forms A/B) or y[CMT=N] (Form C) syntax. N is the 1-based CMT index from the data file’s CMT column.

[scaling]
  obs_scale[CMT=1] = 1000    # parent in mg/L → mg/mL
  obs_scale[CMT=2] = 1       # metabolite already in target units

Form C (ODE) per-CMT:

[structural_model]
  ode(states=[depot, parent, metab])

[scaling]
  y[CMT=1] = parent / V_parent
  y[CMT=2] = metab  / V_metab

Coverage rule — every CMT that has at least one observation in the data must have a matching [CMT=N] entry. The parser only checks syntax; the fit-time validation (run automatically at the top of fit()) errors with a list of the missing CMTs:

[scaling]: per-CMT scaling is missing entries for observed CMTs [2, 3].
Every observed CMT must have an `obs_scale[CMT=N]` (or `y[CMT=N]` for ODE) entry.

Mixing rule — the uniform form (obs_scale = K) and the per-CMT form (obs_scale[CMT=N] = K) are mutually exclusive within the same group. The parser rejects mixing them so the user is explicit about intent. The same rule applies to y and y[CMT=N].

Gradients — per-CMT obs_scale[CMT=N] works with both gradient = auto and gradient = fd. The analytic route materialises a per-observation scale array (one entry per observation in the subject) from a subject-static pk_param_fn evaluation. See Interaction with gradients for the subject-static caveat that applies to all expression-form scales.

Since #439, Form C per-CMT (y[CMT=N] = <expr>) is differentiated analytically by the ODE sensitivity provider — each endpoint’s compiled output program is evaluated over Dual2 (outer gradient) and Dual1 (inner EBE η-gradient), dispatched per observation by subject.obs_cmts[i]. So gradient = auto serves it; gradient = fd is no longer required (the fit falls back to FD only for out-of-scope ODE features such as input-rate/SDE, unsupported steady-state combinations, or LTBS).

Since #486, an η-dependent expression obs_scale = <expr> divisor on an ODE model is also differentiated analytically (previously a Form-C readout y = state/V was the only analytic route). The same quotient rule the closed-form provider uses (apply_expression_scale_*) is applied to the ODE prediction jet for both the outer θ/Ω/σ gradient and the inner EBE η-gradient. The static ODE walk evaluates one scale jet per subject; because the divisor is subject-static even under time-varying covariates (production evaluates it at the subject covariate snapshot), the non-IOV time-varying-covariate walk applies the same single subject-static jet post-walk since #486. The ODE IOV walk evaluates one scale jet per occasion group, so an IOV model with time-varying covariates in the dynamics and obs_scale = V stays on the analytic route since #590. Combined with LTBS, an ODE expression obs_scale still falls back to FD because the walk applies LTBS before the η/θ chain.

Numerical validation. The per-CMT analytic gradient inherits the NONMEM validation of the user-ODE sensitivity engine (#410): the gradient machinery — the augmented Dual2/Dual1 RK45 and the η/θ chain — is unchanged, and per-CMT only selects, per observation, which CMT’s compiled Form-C output program to evaluate. Each such program is the same readout already validated for the uniform y = <expr> case. The per-observation routing itself is verified by the ode_provider_percmt_matches_production test (analytic f, ∂f/∂η, ∂f/∂θ vs the production predictor + finite differences on a two-endpoint model) and the ode_provider_percmt_light_matches_full test (the Dual1 inner η-gradient equals the Dual2 outer to 1e-9). A dedicated NONMEM cross-check on a multi-endpoint model is therefore not re-run here; it is covered transitively by #410’s validation plus these routing tests.

The time-varying-covariate analytic gradient (#439) inherits #410’s NONMEM validation on the same basis: it switches the per-event PK parameters at each covariate breakpoint (the event-driven Dual2/Dual1 walk) but the underlying integrator and η/θ chain are unchanged, exactly mirroring how production’s compute_predictions_with_tv switches parameters over f64. The walk is verified against that production predictor + finite differences — first order by ode_provider_tvcov_matches_production and the second-order Hessian blocks (d2f_deta2, d2f_deta_dtheta) by central-differencing the validated first-order gradient — and the Dual1 inner equals the Dual2 outer (ode_provider_tvcov_light_matches_full).