Robust multivariate normal diagnostics for complr coordinates
Source: R/diagnostics.R
diagnostics.complr.RdUses robust minimum covariance determinant estimates to calculate squared
Mahalanobis distances for the total, between, or within log-ratio
coordinates stored in a complr object.
This can be used to identify extreme values (outliers).
Usage
# S3 method for class 'complr'
diagnostics(
x,
level = c("total", "between", "within"),
parts = 1,
ev.perc = 0.001,
extremevalues = c("theoretical", "empirical"),
...
)Arguments
- x
A
complrobject.- level
A character string indicating which coordinates to diagnose:
"total","between", or"within".- parts
Optional character vector or integer specifying which set of compositional parts should be used. Defaults to the first composition in the
complrobject.- ev.perc
Proportion in the upper tail used to flag extreme values. Defaults to
.001.- extremevalues
Character string indicating whether extreme values should be flagged using a
"theoretical"chi-squared cutoff or an"empirical"cutoff from the observed robust distances.- ...
Additional arguments passed to
covMcd.
Value
A diagnostics object with the raw composition matrix in
x, a data.table of robust distances in distance, the
extreme-value cutoff in cutoff, the requested upper-tail proportion
in ev.perc, the cutoff type in extremevalues, and the
diagnosed coordinate level in levels. The distance element
includes an is_extremevalue column indicating whether each fitted
observation is flagged as an extreme value. For level = "between",
diagnostics are fitted using one observation per idvar.
Examples
data(mcompd)
data(sbp)
ids <- unique(mcompd$ID)[1:20]
cilr <- complr(
data = mcompd[ID %in% ids, .SD[1:3], by = ID],
sbp = sbp,
parts = c("TST", "WAKE", "MVPA", "LPA", "SB"),
idvar = "ID",
total = 1440
)
# One diagnostic object per coordinate level
total_diag <- diagnostics(cilr, level = "total")
between_diag <- diagnostics(cilr, level = "between")
within_diag <- diagnostics(cilr, level = "within")
# Use an empirical cutoff and a larger tail proportion
empirical_diag <- diagnostics(
cilr,
level = "between",
ev.perc = .05,
extremevalues = "empirical"
)
is.diagnostics(between_diag)
#> [1] TRUE
head(between_diag$distance)
#> distance expected_distance deviates is_extremevalue
#> <num> <num> <num> <lgcl>
#> 1: 14.364835 5.672230 8.6926050 FALSE
#> 2: 1.417544 1.218762 0.1987823 FALSE
#> 3: 3.403716 3.518969 -0.1152530 FALSE
#> 4: 26.231626 6.342329 19.8892973 TRUE
#> 5: 2.545905 2.898220 -0.3523146 FALSE
#> 6: 57.552119 11.143287 46.4088323 TRUE