Skip to contents

Uses robust minimum covariance determinant estimates to calculate squared Mahalanobis distances for the total, between, or within log-ratio coordinates stored in a complr object. This can be used to identify extreme values (outliers).

Usage

# S3 method for class 'complr'
diagnostics(
  x,
  level = c("total", "between", "within"),
  parts = 1,
  ev.perc = 0.001,
  extremevalues = c("theoretical", "empirical"),
  ...
)

Arguments

x

A complr object.

level

A character string indicating which coordinates to diagnose: "total", "between", or "within".

parts

Optional character vector or integer specifying which set of compositional parts should be used. Defaults to the first composition in the complr object.

ev.perc

Proportion in the upper tail used to flag extreme values. Defaults to .001.

extremevalues

Character string indicating whether extreme values should be flagged using a "theoretical" chi-squared cutoff or an "empirical" cutoff from the observed robust distances.

...

Additional arguments passed to covMcd.

Value

A diagnostics object with the raw composition matrix in x, a data.table of robust distances in distance, the extreme-value cutoff in cutoff, the requested upper-tail proportion in ev.perc, the cutoff type in extremevalues, and the diagnosed coordinate level in levels. The distance element includes an is_extremevalue column indicating whether each fitted observation is flagged as an extreme value. For level = "between", diagnostics are fitted using one observation per idvar.

Examples

data(mcompd)
data(sbp)

ids <- unique(mcompd$ID)[1:20]
cilr <- complr(
  data = mcompd[ID %in% ids, .SD[1:3], by = ID],
  sbp = sbp,
  parts = c("TST", "WAKE", "MVPA", "LPA", "SB"),
  idvar = "ID",
  total = 1440
)

# One diagnostic object per coordinate level
total_diag <- diagnostics(cilr, level = "total")
between_diag <- diagnostics(cilr, level = "between")
within_diag <- diagnostics(cilr, level = "within")

# Use an empirical cutoff and a larger tail proportion
empirical_diag <- diagnostics(
  cilr,
  level = "between",
  ev.perc = .05,
  extremevalues = "empirical"
)

is.diagnostics(between_diag)
#> [1] TRUE
head(between_diag$distance)
#>     distance expected_distance   deviates is_extremevalue
#>        <num>             <num>      <num>          <lgcl>
#> 1: 14.364835          5.672230  8.6926050           FALSE
#> 2:  1.417544          1.218762  0.1987823           FALSE
#> 3:  3.403716          3.518969 -0.1152530           FALSE
#> 4: 26.231626          6.342329 19.8892973            TRUE
#> 5:  2.545905          2.898220 -0.3523146           FALSE
#> 6: 57.552119         11.143287 46.4088323            TRUE