Example: Data selection (LLOQ filtering)

The [data_selection] block excludes records from the dataset at read time without modifying the CSV. It is the ferx equivalent of NONMEM’s $DATA IGNORE=. This example drops observations below a surrogate LLOQ of 1.0 mg/L.

Model

library(ferx)

ex  <- ferx_example("warfarin_data_selection")
ferx_model_show(ex$model)
# model: warfarin_data_selection.ferx 
# One-compartment oral PK model (warfarin) with data-selection filtering
#
# Demonstrates [data_selection]: excludes observations below a surrogate LLOQ
# (DV < 1.0 mg/L) at read time without modifying the CSV.
# Equivalent to NONMEM $DATA IGNORE=.

[parameters]
  theta TVCL(0.134, 0.001, 10.0)
  theta TVV(8.1, 0.1, 500.0)
  theta TVKA(1.0, 0.01, 50.0)

  omega ETA_CL ~ 0.07
  omega ETA_V  ~ 0.02
  omega ETA_KA ~ 0.40

  sigma PROP_ERR ~ 0.01 (sd)

[individual_parameters]
  CL = TVCL * exp(ETA_CL)
  V  = TVV  * exp(ETA_V)
  KA = TVKA * exp(ETA_KA)

[structural_model]
  pk one_cpt_oral(cl=CL, v=V, ka=KA)

[error_model]
  DV ~ proportional(PROP_ERR)

[data_selection]
  # Drop observations below a surrogate LLOQ of 1.0 mg/L.
  ignore = DV < 1.0

[fit_options]
  method   = foce
  maxiter  = 300
  gradient = fd

The relevant block is:

[data_selection]
  ignore = DV < 1.0

R-side preview

Before fitting, use ferx_selection() to inspect which records the filter would drop:

sel <- ferx_selection(ex$data, ignore = "DV < 1.0")

cat("Total records:", nrow(read.csv(ex$data)), "\n")
Total records: 120 
cat("Retained:     ", nrow(sel), "\n")
Retained:      118 
cat("Excluded obs: ", sel$exclusions$n_obs_excluded, "\n")
Excluded obs:  

The excluded rows carry a .exclude_reason column:

ferx_selection_excluded(sel)[, c("ID", "TIME", "DV", ".exclude_reason")]
   ID TIME     DV  .exclude_reason
84  7  120 0.9761 ignore: DV < 1.0
96  8  120 0.8700 ignore: DV < 1.0

Fit

fit <- ferx_fit(ex$model, ex$data, verbose = FALSE)
print(fit)
============================================================
 NONLINEAR MIXED EFFECTS MODEL ESTIMATION
============================================================
 Model: warfarin_data_selection  Dataset: warfarin
 Method: FOCE | Gradient: FD | Subjects: 10 | Obs: 108

 STATUS: CONVERGED   96 iterations   0.2s
 OFV: -266.9150    AIC: -252.9150    BIC: -234.1401

DATA SELECTION
------------------------------------------------------------
  Records read: 120    Obs excl.: 2    Doses excl.: 0    Other excl.: 0
  Fired ignore: DV < 1.0

MODEL STRUCTURE (auto-derived)
------------------------------------------------------------
  Structural:  1-cpt oral  (TVCL, TVV, TVKA)
  IIV:         ETA_CL, ETA_V, ETA_KA
  IOV:         none
  Residual:    proportional

THETA
------------------------------------------------------------
Parameter            Estimate           SE       %RSE
----------------------------------------------------
TVCL                 0.132905     0.006664        5.0
TVV                  7.730678     0.233804        3.0
TVKA                 0.722044     0.124469       17.2

OMEGA  (between-subject variability)
------------------------------------------------------------
  ETA_CL                   [log-normal]  = 0.028609  CV% = 17.0  SE = 0.012757
  ETA_V                    [log-normal]  = 0.009509  CV% = 9.8  SE = 0.004259
  ETA_KA                   [log-normal]  = 0.349301  CV% = 64.7  SE = 0.161022

SIGMA  (residual error)
------------------------------------------------------------
  PROP_ERR         [proportional] = 0.010776  (var = 0.000116, CV% = 1.1)  SE = 0.000952  [initial specified as SD]

SHRINKAGE
------------------------------------------------------------
 ETA_CL: -0.3%   ETA_V: 0.1%   ETA_KA: -0.0%   EPS: 17.9%

DIAGNOSTICS
------------------------------------------------------------
 Covariance: computed   Cond: 2.6   DW: 2.65 [negative autocorrelation]   IWRES lag-1 r: -0.373

RUN INFO
------------------------------------------------------------
 Gradient (requested): fd   (used: fd)
 ferx v0.1.6 (core v0.1.6)

SETTINGS  (model file / call-time override)
------------------------------------------------------------
  method                       foce  [model only]
  maxiter                      300  [model only]
  gradient                     fd  [model only]

------------------------------------------------------------
 1 warning  --  call ferx_warnings(fit) for details
============================================================

The print output includes a DATA SELECTION section showing how many records were excluded and which conditions fired.

Exclusion details in fit$exclusions

fit$exclusions$n_records_total    # total records read
[1] 120
fit$exclusions$n_obs_excluded     # excluded observations
[1] 2
fit$exclusions$fired_ignore       # which ignore conditions fired
[1] "ignore: DV < 1.0"

Comparison with the unfiltered fit

The two records at TIME = 120 (the latest time point) are removed. Since these are the lowest-concentration samples, the effect on estimates is small but measurable:

fit_all <- ferx_fit(ferx_example("warfarin")$model,
                    ferx_example("warfarin")$data,
                    verbose = FALSE)

rbind(
  unfiltered = round(c(TVCL = fit_all$theta["TVCL"],
                       TVV  = fit_all$theta["TVV"]), 4),
  filtered   = round(c(TVCL = fit$theta["TVCL"],
                       TVV  = fit$theta["TVV"]), 4)
)
           TVCL.TVCL TVV.TVV
unfiltered    0.1330  7.7307
filtered      0.1329  7.7307

R-side filtering as an alternative to [data_selection]

Pass a ferx_selection() result directly to ferx_fit() instead of writing a [data_selection] block. Both approaches produce the same fit:

filtered_data <- ferx_selection(ex$data, ignore = "DV < 1.0")
fit2 <- ferx_fit(ex$model, filtered_data)

When both sources supply conditions they are merged — duplicate expressions are deduplicated automatically.

See also