Bias, drift, explainability, and adversarial test methods

ComplyHat runs zero internal LLM calls. Your host agent brings its own reasoning; ComplyHat runs the deterministic statistical methods below and returns structured, audit-tagged citations. Every metric value, threshold, pass/fail ruling, dataset row count, subgroup sizes, engine version, and random seed is persisted with the report — a third party can re-derive every finding from the same inputs.

Bias

Four fairness metrics. Each runs against a tabular dataset with an outcome column, a protected-class column, and — for two of them — a ground-truth column. All return a pass / fail ruling against a configurable threshold; the defaults below trace to legal or academic sources. Before any of the four tests run, a data-quality gate checks subgroup sample sizes (warns if n < 30), class imbalance (warns if the smallest subgroup is under 5% of the dataset), and missing values (warns if more than 10% of rows are missing the protected-class column). Warnings are carried into the report so a reviewer can assess whether a pass ruling is statistically meaningful.

Disparate impact (Four-Fifths Rule)

For each subgroup g, compute the favorable rate favorable(g) / total(g). The reference group is the subgroup with the highest favorable rate. The adverse impact ratio for any other subgroup is:

adverse_impact_ratio(g) = favorable_rate(g) / favorable_rate(reference)

Fail if any ratio falls below 0.80. Source: Uniform Guidelines on Employee Selection Procedures, 29 C.F.R. §1607.4(D) (1978). Adopted by NIST AI RMF (MAP-5) and reused by NYC Local Law 144 audit rules.

Statistical parity

statistical_parity_difference = max_group_rate − min_group_rate

Fail if the difference exceeds 0.10. The multi-group form avoids picking an arbitrary reference group; the additive form is more numerically stable than the Four-Fifths ratio when the reference rate is small. Source: Dwork, Hardt, Pitassi, Reingold, Zemel. Fairness Through Awareness. ITCS 2012.

Equal opportunity

True positive rate per subgroup:

TPR(g) = true_positives(g) / actual_positives(g)

Fail if min(TPR) / max(TPR) < 0.80. Requires ground-truth labels — ComplyHat skips this test automatically when the model is deployed but not yet validated against outcomes. Source: Hardt, Price, Srebro. Equality of Opportunity in Supervised Learning. NeurIPS 2016.

Predictive parity

Positive predictive value per subgroup:

PPV(g) = true_positives(g) / predicted_positives(g)

Fail if max(PPV) − min(PPV) > 0.10.

Predictive parity and equal opportunity cannot both hold when base rates differ across groups (Chouldechova 2016). ComplyHat reports both metrics and lets the audit context determine which matters for your use case. Do not suppress either metric from the report.

Source: Chouldechova. Fair prediction with disparate impact. FATML 2016. See also Kleinberg, Mullainathan, Raghavan. Inherent Trade-Offs in the Fair Determination of Risk Scores. ITCS 2017.

Drift

Drift testing compares a baseline distribution (typically training data) against a production distribution. Two methods form the standard pair; two more are available when the standard pair falls short.

Population Stability Index (PSI)

For each bin i:

psi_i = (p_prod_i − p_base_i) × ln(p_prod_i / p_base_i)

PSI is the sum across bins. Bins with zero counts are floored to a small epsilon to keep the log finite.

PSI range	Interpretation
`< 0.10`	No material change
`0.10 – 0.25`	Moderate drift — monitor
`>= 0.25`	Significant drift — investigate

Source: Yurdakul, Naranjo. Statistical Properties of the Population Stability Index. 2019. The 0.25 industry threshold predates the paper; see Siddiqi, Credit Risk Scorecards, Wiley 2006.

Kolmogorov-Smirnov test

KS = max_x | F_prod(x) − F_base(x) |

ComplyHat returns the D-statistic and the two-sample p-value computed from the asymptotic Kolmogorov distribution. Drift is flagged when p < 0.05 and KS > 0.10. The dual gate is intentional: with large production samples, any real numeric feature will produce a statistically significant but trivially small KS statistic. Both conditions must hold. Source: Massey. The Kolmogorov-Smirnov Test for Goodness of Fit. JASA 1951. Standard in model risk since Federal Reserve SR 11-7 (2011) required ongoing monitoring of input data.

Additional methods

Jensen-Shannon divergence (bounded in [0, 1], useful when PSI is numerically unstable) and chi-squared (for categorical features) are also available. Reports include all metric values that ran, so an auditor sees the full picture regardless of which methods triggered a flag.

Explainability

Two model-agnostic local explainers. Both return per-feature attribution scores for a single prediction. You pass in the decision plus a set of neighbor or background decisions with their precomputed outcomes — ComplyHat does not call your model’s prediction function f.

LIME with intercept

Each neighbor is weighted by an exponential kernel of its Euclidean distance to the target decision in feature space. ComplyHat fits a weighted least-squares linear surrogate against the full design matrix — a leading column of ones plus the feature columns. The first coefficient is the intercept; the remaining coefficients are the per-feature slopes returned as local attributions. Without the intercept, the surrogate is forced through the origin, which biases slope estimates whenever the neighborhood mean is offset from zero. ComplyHat returns the intercept alongside the slopes so reviewers can audit it. Defaults: kernel width 0.75; up to 50,000 neighbors retained. Source: Ribeiro, Singh, Guestrin. “Why Should I Trust You?” Explaining the Predictions of Any Classifier. KDD 2016.

Coalition-attribution proxy

Feature coalitions are enumerated (small feature sets) or sampled (large). Each coalition is weighted by the Kernel-SHAP kernel:

weight(S) = (M − 1) / (C(M, |S|) × |S| × (M − |S|))

ComplyHat solves weighted least squares against per-coalition outcome blends to produce a per-feature attribution ranking. This is not Shapley values. True Kernel SHAP substitutes absent features by sampling from a background distribution and re-evaluating the model’s prediction function f on the masked vector. Because ComplyHat cannot call f, the per-coalition outcome is approximated as:

outcome(S) = (|S| / M) · y_decision + (1 − |S| / M) · y_background_avg

The kernel weights produce a defensible feature-importance ranking, but the resulting numbers are coalition-fraction-weighted attributions, not Shapley values. Reports that use this method are labelled coalition_attribution and must not be presented to a regulator as Shapley values. Defaults: up to 50,000 coalitions; 10,000 background decisions retained. Background: Lundberg, Lee. A Unified Approach to Interpreting Model Predictions. NeurIPS 2017. The ComplyHat proxy implementation is not a substitute for that method.

Completeness check

Both explainers report a completeness score: how closely the sum of attributions matches actual_prediction − baseline_prediction. Scores are in [0, 1]. A low completeness score on a low-sample run signals noisy attributions — treat it as a red flag before the explanation enters an audit trail.

Adversarial robustness

Adversarial testing probes whether a model’s prediction is stable under input perturbations. ComplyHat runs two test families.

Boundary robustness

For each test point, find the smallest perturbation — in L-infinity or L2 norm — that flips the model’s prediction. ComplyHat reports the median and 10th-percentile perturbation magnitudes across the test set. The pass threshold is use-case-dependent; your audit team sets it based on the plausible perturbation range for the use case (pixel-noise tolerance for vision models, rounding tolerance for tabular models). Source: Szegedy et al. Intriguing properties of neural networks. ICLR 2014. Method is a black-box variant of Carlini, Wagner. Towards Evaluating the Robustness of Neural Networks. IEEE S&P 2017.

Data-quality robustness

ComplyHat injects realistic corruptions — missing values, out-of-range numerics, mistyped categoricals — at controlled rates (1%, 5%, 10%) and reports the delta in prediction distribution per corruption type. This measures graceful degradation under ordinary production-data errors, which is what operational teams actually face. EU AI Act Article 15 (§1, §3) explicitly requires this kind of robustness evidence for high-risk systems.

Reproducibility

For every report, ComplyHat persists:

Metric values and thresholds
Pass/fail rulings
Dataset row count and subgroup sizes
Data-quality warnings
Engine version
Random seeds used in sampling steps

The engines are deterministic under fixed seeds. A third party can re-derive every number in any ComplyHat report given the same inputs and engine version.

ComplyHat runs zero internal LLM calls. Host agents (Claude Code, Codex, custom MCP clients) bring their own reasoning. ComplyHat returns structured citations and audit-tagged prose — never synthesized findings.

Next steps

Supported frameworks

Which metrics each regulator requires, at what cadence, and for which protected classes.

Tool reference

The MCP entry points that invoke these methods, with example requests and responses.

Get Started

Core Concepts

Guides

Reference

Bias, drift, explainability, and adversarial test methods

Bias

Disparate impact (Four-Fifths Rule)

Statistical parity

Equal opportunity

Predictive parity

Drift

Population Stability Index (PSI)

Kolmogorov-Smirnov test

Additional methods

Explainability

LIME with intercept

Coalition-attribution proxy

Completeness check

Adversarial robustness

Boundary robustness

Data-quality robustness

Reproducibility

Next steps

Supported frameworks

Tool reference

Get Started

Core Concepts

Guides

Reference

Documentation Index

​Bias

​Disparate impact (Four-Fifths Rule)

​Statistical parity

​Equal opportunity

​Predictive parity

​Drift

​Population Stability Index (PSI)

​Kolmogorov-Smirnov test

​Additional methods

​Explainability

​LIME with intercept

​Coalition-attribution proxy

​Completeness check

​Adversarial robustness

​Boundary robustness

​Data-quality robustness

​Reproducibility

​Next steps

Supported frameworks

Tool reference

Bias

Disparate impact (Four-Fifths Rule)

Statistical parity

Equal opportunity

Predictive parity

Drift

Population Stability Index (PSI)

Kolmogorov-Smirnov test

Additional methods

Explainability

LIME with intercept

Coalition-attribution proxy

Completeness check

Adversarial robustness

Boundary robustness

Data-quality robustness

Reproducibility

Next steps