Bias Detection Score Calculator
Calculates a composite Bias Detection Score (BDS) using demographic parity difference, equalized odds difference, and calibration error — three foundational fairness metrics used in algorithmic auditing and AI ethics research.
Formulas Used
1. Demographic Parity Difference (DPD)
DPD = |PR_A − PR_B|
Measures whether both groups receive positive predictions at equal rates, regardless of ground truth. A DPD of 0 indicates perfect demographic parity.
2. Equalized Odds Difference (EOD)
EOD = 0.5 × (|TPR_A − TPR_B| + |FPR_A − FPR_B|)
Averages the absolute differences in True Positive Rates and False Positive Rates across groups. Proposed by Hardt et al. (2016) as a joint constraint on error rates.
3. Calibration Difference (CD)
CD = |CAL_A − CAL_B|
Compares mean predicted probabilities between groups. A well-calibrated model should produce similar mean scores for both groups when base rates are equal.
4. Composite Bias Detection Score (BDS)
BDS = w₁ × DPD + w₂ × EOD + w₃ × CD
A weighted composite score ∈ [0, 1]. Default equal weights (w₁ = w₂ = w₃ ≈ 0.333) treat all three fairness criteria equally. Weights must sum to 1.
5. Disparate Impact Ratio (DIR) — Supplementary
DIR = PR_B / PR_A
Per the EEOC 4/5ths (80%) rule: a DIR below 0.80 indicates potential adverse impact against Group B. DIR is not included in BDS but is reported as a supplementary indicator.
Severity Thresholds (BDS)
| BDS Range | Classification |
|---|---|
| 0.00 – 0.049 | Minimal / No Detectable Bias |
| 0.05 – 0.099 | Low Bias |
| 0.10 – 0.199 | Moderate Bias |
| 0.20 – 0.349 | High Bias |
| ≥ 0.35 | Severe Bias |
Assumptions & References
- Group A is treated as the reference group (e.g., majority or privileged group); Group B is the comparison group (e.g., minority or protected group). The choice of reference group affects DIR but not BDS.
- All rate inputs (TPR, FPR, PR, CAL) must lie in [0, 1]. TPR ≥ FPR is expected for any model performing above random chance.
- Weights w₁, w₂, w₃ must be non-negative and sum to exactly 1.0. Adjust weights to prioritize specific fairness criteria based on domain context.
- BDS assumes equal base rates across groups. If base rates differ substantially, calibration difference may reflect real-world prevalence rather than model bias — interpret CD with caution in such cases.
- This calculator evaluates group fairness (statistical parity), not individual fairness. These two notions can be in tension (Chouldechova, 2017).
- Hardt, M., Price, E., & Srebro, N. (2016). Equality of Opportunity in Supervised Learning. NeurIPS 2016.
- Chouldechova, A. (2017). Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments. Big Data, 5(2), 153–163.
- EEOC Uniform Guidelines (1978). 4/5ths (80%) rule for adverse impact — Disparate Impact Ratio threshold of 0.80.
- Barocas, S., Hardt, M., & Narayanan, A. (2023). Fairness and Machine Learning: Limitations and Opportunities. MIT Press.
- BDS severity thresholds are adapted from common fairness audit practice; thresholds may vary by regulatory context, domain risk level, and organizational policy.