ComplyHat returns deterministic, reproducible numbers — the kind a regulator can re-derive from your inputs. It does not synthesize prose, run an internal LLM, or call your model’s prediction function. Your MCP-capable host (Claude Code, Claude Desktop, Codex Desktop, Codex CLI, OpenClaw, NemoClaw, or any client speaking streamable-HTTP MCP) brings the reasoning. Every report persists the metric values, thresholds, dataset row counts, subgroup sizes, data-quality warnings, engine version, and random seeds — so a third party can re-derive any finding from the same inputs.Documentation Index
Fetch the complete documentation index at: https://docs.complyhat.ai/llms.txt
Use this file to discover all available pages before exploring further.
Bias
Four fairness tests. Each runs against a tabular dataset and returns apass/fail ruling. Defaults trace to legal or academic sources.
- Disparate impact (Four-Fifths Rule) — fail if any subgroup’s favorable rate is below 80% of the highest. Source: 29 CFR §1607.4(D), 1978.
- Statistical parity — fail if the gap between the highest and lowest subgroup rates exceeds 0.10. Source: Dwork et al., ITCS 2012.
- Equal opportunity — fail if the lowest subgroup TPR is below 80% of the highest. Requires ground-truth labels. Source: Hardt et al., NeurIPS 2016.
- Predictive parity — fail if the gap between subgroup PPVs exceeds 0.10. Source: Chouldechova, FATML 2016.
Drift
Compares a baseline distribution (typically training) against production.- Population Stability Index (PSI) —
< 0.10no material change,0.10–0.25monitor,>= 0.25investigate. Source: Yurdakul & Naranjo, 2019. - Kolmogorov-Smirnov — flags when
p < 0.05andKS > 0.10(dual gate, since large samples make any real feature trivially significant). Source: Massey, JASA 1951. - Jensen-Shannon divergence and chi-squared also available; reports include all metrics that ran.
Explainability
Two model-agnostic local explainers. Both return per-feature attributions.- LIME with intercept — weighted least-squares surrogate against neighbors weighted by an exponential kernel. Returns the intercept alongside slopes so reviewers can audit it. Defaults: kernel width 0.75, up to 50,000 neighbors. Source: Ribeiro et al., KDD 2016.
- Coalition attribution — Kernel-SHAP-weighted coalitions with outcomes blended as
(|S| / M) · y_decision + (1 − |S| / M) · y_background_avg. Not Shapley values — ComplyHat cannot call your model’sf. Reports label thiscoalition_attribution; do not present the numbers to a regulator as Shapley values. Inspired by Lundberg & Lee, NeurIPS 2017.
[0, 1] — how closely the sum of attributions matches actual_prediction − baseline_prediction. Low scores flag noisy runs.
Adversarial robustness
- Boundary robustness — smallest perturbation (L-infinity or L2) that flips the prediction. Reports median and 10th-percentile magnitudes. Pass threshold is regulatory-use-case-dependent. Source: Szegedy et al., ICLR 2014.
- Data-quality robustness — inject realistic corruptions (missing values, out-of-range numerics, mistyped categoricals) at 1%, 5%, 10% rates and report prediction-distribution deltas. Required for EU AI Act Article 15 robustness evidence.
ComplyHat runs zero internal LLM calls. Host agents bring the reasoning; ComplyHat returns structured citations and audit-tagged prose (
[EXTRACTED] / [INFERRED] / [AMBIGUOUS]).