Reliability reference scripts¶
RABET 1.3.2 introduced the Reliability tab, which computes inter-rater
and intra-rater agreement entirely inside the application using the
pingouin Python package. This folder
contains an independent R implementation that exists for two reasons:
-
Cross-language reproducibility. Researchers who run their statistics pipeline in R can verify the in-app numbers by re-computing the same agreement matrix with the canonical R packages (
psychand optionallyirr). -
Reviewer transparency. When publishing reliability numbers from RABET, citing both the in-app pingouin computation and an independent R reference reassures reviewers that the agreement matrix is implementation- neutral.
What is provided¶
| File | Purpose |
|---|---|
compute_agreement.R |
Stand-alone R script. Loads two summary_table.csv files, matches rows by animal_id, computes per-metric ICC(2,1), Pearson r, mean absolute difference, and writes a results CSV. Mirrors RABET's Summary mode. |
A Detailed-mode (time-window Cohen's kappa / Krippendorff's alpha) R reference will follow in a later release. The pingouin implementation inside RABET is the authoritative computation in the meantime.
Quick start¶
# Install dependencies (once):
Rscript -e 'install.packages(c("psych"))'
# Reproduce the Summary-mode agreement matrix:
Rscript docs/reliability/compute_agreement.R \
path/to/scorer_A_summary.csv \
path/to/scorer_B_summary.csv \
reliability_summary_R.csv
The script prints the per-metric agreement table and writes it to the
output CSV (defaults to reliability_summary_R.csv next to the current
working directory if no third argument is given).
Definitions¶
ICC(2,1) here refers to Pingouin's ICC2 output, corresponding to
the ICC2 row returned by psych::ICC: a single-rater, absolute-agreement
ICC.
Terminology differs across ICC conventions and software packages. In the
Shrout and Fleiss / Pingouin convention, ICC2 treats raters as random
and ICC3 treats raters as fixed. Some McGraw and Wong-style labels can
map the same numerical form to two-way random or two-way mixed
absolute-agreement interpretations. Therefore, RABET reports the software
label (ICC2) and the form ICC(2,1), and users should interpret the
fixed/random rater assumption according to their study design.
Pearson r is the standard product-moment correlation across the matched animals.
Mean absolute difference is mean(abs(A - B)) over animals present in
both summary files.
Expected differences from RABET's in-app output¶
The two implementations should agree to within ~1e-6 for ICC and r, and exactly for mean absolute difference. Larger discrepancies usually mean:
- One side dropped an animal that the other kept (check
unmatched_a/unmatched_bin RABET's status panel and the R script's stdout). - Numerical precision differences in how the underlying linear-mixed model solver handles degenerate inputs (e.g. all-zero columns).
If the values diverge by more than that, please open an issue at https://github.com/mi2e-K/RABET/issues with both CSVs attached.