Run bias and fairness tests against your AI models

ComplyHat’s bias engine runs four fairness metrics deterministically against tabular data you supply. Each test returns a pass or fail ruling against a configurable threshold, per protected class, with data-quality assessments that tell you whether the result is statistically meaningful. When any test fails, the model’s compliance_status is automatically updated to non_compliant; on a clean run it moves to needs_review.

The four test types

All four tests operate on a tabular dataset with an outcome column and one or more protected-class columns. For the statistical details and academic sources behind each metric, see methodology.

Test type	What it measures	Threshold	Ground truth required?
`disparate_impact`	Favorable rate ratio between subgroups (Four-Fifths Rule)	Fail if any ratio < 0.80	No
`statistical_parity`	Absolute difference in favorable rates across subgroups	Fail if difference > 0.10	No
`equal_opportunity`	True positive rate ratio across subgroups	Fail if min/max TPR < 0.80	Yes
`predictive_parity`	Positive predictive value difference across subgroups	Fail if max − min PPV > 0.10	Yes

If you include equal_opportunity or predictive_parity in test_types, you must also supply ground_truth_column. The engine returns a 422 error if you omit it.

Run a bias test

Call bias_tests with mode: "run". Supply the model_id, your dataset inline in data.rows, the column names, and the test types you want. The data object requires source: "inline".

{
  "tool": "bias_tests",
  "arguments": {
    "mode": "run",
    "model_id": "mdl_01j9z...",
    "framework": "nyc-ll144",
    "test_types": ["disparate_impact", "statistical_parity"],
    "protected_classes": ["gender", "race"],
    "outcome_column": "hired",
    "favorable_outcome": "1",
    "data": {
      "source": "inline",
      "rows": [
        { "gender": "F", "race": "Black", "hired": "1", "score": 0.82 },
        { "gender": "M", "race": "White", "hired": "1", "score": 0.91 },
        { "gender": "F", "race": "Hispanic", "hired": "0", "score": 0.61 }
      ]
    }
  }
}

For tests that require ground truth labels, add the ground_truth_column field:

{
  "tool": "bias_tests",
  "arguments": {
    "mode": "run",
    "model_id": "mdl_01j9z...",
    "framework": "eu-ai-act",
    "test_types": ["disparate_impact", "statistical_parity", "equal_opportunity", "predictive_parity"],
    "protected_classes": ["gender", "age_group"],
    "outcome_column": "prediction",
    "favorable_outcome": "approved",
    "ground_truth_column": "actual_outcome",
    "data": {
      "source": "inline",
      "rows": []
    }
  }
}

The response contains a test_id, an overall_result (pass or fail), per-(test_type, protected_class) results with details, and a data_quality assessment for each protected class. Check data_quality[*].adequate and data_quality[*].warnings before treating a pass as conclusive.

List and retrieve test results

List all bias tests for a model with mode: "list":

{
  "tool": "bias_tests",
  "arguments": {
    "mode": "list",
    "model_id": "mdl_01j9z..."
  }
}

Retrieve a specific result by its test_id with mode: "get":

{
  "tool": "bias_tests",
  "arguments": {
    "mode": "get",
    "test_id": "bt_01m5..."
  }
}

Schedule recurring tests

Regulators specify both the test types and the cadence they expect. You can encode both into a recurring schedule so your host agent runs tests automatically without manual intervention. Create a schedule with mode: "create_schedule". Provide the dataset_id to run against, a test_config object describing the test parameters, the cadence (monthly, quarterly, or annually), and the next_run_at timestamp for the first run.

{
  "tool": "bias_tests",
  "arguments": {
    "mode": "create_schedule",
    "model_id": "mdl_01j9z...",
    "dataset_id": "ds_01k3...",
    "cadence": "quarterly",
    "next_run_at": "2026-07-01T00:00:00Z",
    "test_config": {
      "framework": "sr-11-7",
      "test_types": ["disparate_impact", "statistical_parity"],
      "protected_classes": ["gender", "race", "age_group"],
      "outcome_column": "prediction",
      "favorable_outcome": "approved"
    }
  }
}

List all active schedules for a model with mode: "list_schedules":

{
  "tool": "bias_tests",
  "arguments": {
    "mode": "list_schedules",
    "model_id": "mdl_01j9z..."
  }
}

Framework-specific requirements

Different frameworks require different test types and cadences. Configure your schedules accordingly:

Framework	Required tests	Cadence
`sr-11-7`	`disparate_impact`, `statistical_parity`	Quarterly
`eu-ai-act`	`disparate_impact`, `statistical_parity`, `equal_opportunity`, `predictive_parity`	Quarterly
`nyc-ll144`	`disparate_impact`, `statistical_parity`	Annual (per AEDT use case)
`naic-model-bulletin`	`disparate_impact`	Annual
`cms-0057-f`	`disparate_impact`, `equal_opportunity`	Quarterly

NYC Local Law 144 requires the annual bias audit to be conducted by an independent auditor. ComplyHat produces the technical artefacts — the independence requirement is a legal and operational matter your organization must arrange separately.

Next steps: Review the statistical methods behind each test in methodology, or see all bias_tests modes in the tool reference.

Get Started

Core Concepts

Guides

Reference

Run bias and fairness tests against your AI models

The four test types

Run a bias test

List and retrieve test results

Schedule recurring tests

Framework-specific requirements

Get Started

Core Concepts

Guides

Reference

Documentation Index

​The four test types

​Run a bias test

​List and retrieve test results

​Schedule recurring tests

​Framework-specific requirements

The four test types

Run a bias test

List and retrieve test results

Schedule recurring tests

Framework-specific requirements