Bias Detection Score Calculator

Calculates a composite Bias Detection Score (BDS) using demographic parity difference, equalized odds difference, and calibration error — three foundational fairness metrics used in algorithmic auditing and AI ethics research.

True Positive Rate – Group A (TPR_A): (0 to 1) True Positive Rate – Group B (TPR_B): (0 to 1) False Positive Rate – Group A (FPR_A): (0 to 1) False Positive Rate – Group B (FPR_B): (0 to 1) Positive Prediction Rate – Group A (PR_A): (0 to 1) Positive Prediction Rate – Group B (PR_B): (0 to 1) Calibration Score – Group A (CAL_A): Mean predicted probability (0 to 1) Calibration Score – Group B (CAL_B): Mean predicted probability (0 to 1) Weight – Demographic Parity (w₁): Default 0.333 Weight – Equalized Odds (w₂): Default 0.333 Weight – Calibration (w₃): Default 0.334

Results will appear here.

Formulas Used

1. Demographic Parity Difference (DPD)

DPD = |PR_A − PR_B|

Measures whether both groups receive positive predictions at equal rates, regardless of ground truth. A DPD of 0 indicates perfect demographic parity.

2. Equalized Odds Difference (EOD)

EOD = 0.5 × (|TPR_A − TPR_B| + |FPR_A − FPR_B|)

Averages the absolute differences in True Positive Rates and False Positive Rates across groups. Proposed by Hardt et al. (2016) as a joint constraint on error rates.

3. Calibration Difference (CD)

CD = |CAL_A − CAL_B|

Compares mean predicted probabilities between groups. A well-calibrated model should produce similar mean scores for both groups when base rates are equal.

4. Composite Bias Detection Score (BDS)

BDS = w₁ × DPD + w₂ × EOD + w₃ × CD

A weighted composite score ∈ [0, 1]. Default equal weights (w₁ = w₂ = w₃ ≈ 0.333) treat all three fairness criteria equally. Weights must sum to 1.

5. Disparate Impact Ratio (DIR) — Supplementary

DIR = PR_B / PR_A

Per the EEOC 4/5ths (80%) rule: a DIR below 0.80 indicates potential adverse impact against Group B. DIR is not included in BDS but is reported as a supplementary indicator.

Severity Thresholds (BDS)

BDS Range	Classification
0.00 – 0.049	Minimal / No Detectable Bias
0.05 – 0.099	Low Bias
0.10 – 0.199	Moderate Bias
0.20 – 0.349	High Bias
≥ 0.35	Severe Bias

Assumptions & References

Group A is treated as the reference group (e.g., majority or privileged group); Group B is the comparison group (e.g., minority or protected group). The choice of reference group affects DIR but not BDS.
All rate inputs (TPR, FPR, PR, CAL) must lie in [0, 1]. TPR ≥ FPR is expected for any model performing above random chance.
Weights w₁, w₂, w₃ must be non-negative and sum to exactly 1.0. Adjust weights to prioritize specific fairness criteria based on domain context.
BDS assumes equal base rates across groups. If base rates differ substantially, calibration difference may reflect real-world prevalence rather than model bias — interpret CD with caution in such cases.
This calculator evaluates group fairness (statistical parity), not individual fairness. These two notions can be in tension (Chouldechova, 2017).
Hardt, M., Price, E., & Srebro, N. (2016). Equality of Opportunity in Supervised Learning. NeurIPS 2016.
Chouldechova, A. (2017). Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments. Big Data, 5(2), 153–163.
EEOC Uniform Guidelines (1978). 4/5ths (80%) rule for adverse impact — Disparate Impact Ratio threshold of 0.80.
Barocas, S., Hardt, M., & Narayanan, A. (2023). Fairness and Machine Learning: Limitations and Opportunities. MIT Press.
BDS severity thresholds are adapted from common fairness audit practice; thresholds may vary by regulatory context, domain risk level, and organizational policy.

Bias Detection Score Calculator

Formulas Used

Assumptions & References

In the network

Network

Bias Detection Score Calculator

Formulas Used

Assumptions & References

More Calculators

In the network

Network