Bias Detection Score Calculator

Calculates a composite Bias Detection Score (BDS) using demographic parity difference, equalized odds difference, and calibration error — three foundational fairness metrics used in algorithmic auditing and AI ethics research.

Results will appear here.

Formulas Used

1. Demographic Parity Difference (DPD)

DPD = |PR_A − PR_B|

Measures whether both groups receive positive predictions at equal rates, regardless of ground truth. A DPD of 0 indicates perfect demographic parity.

2. Equalized Odds Difference (EOD)

EOD = 0.5 × (|TPR_A − TPR_B| + |FPR_A − FPR_B|)

Averages the absolute differences in True Positive Rates and False Positive Rates across groups. Proposed by Hardt et al. (2016) as a joint constraint on error rates.

3. Calibration Difference (CD)

CD = |CAL_A − CAL_B|

Compares mean predicted probabilities between groups. A well-calibrated model should produce similar mean scores for both groups when base rates are equal.

4. Composite Bias Detection Score (BDS)

BDS = w₁ × DPD + w₂ × EOD + w₃ × CD

A weighted composite score ∈ [0, 1]. Default equal weights (w₁ = w₂ = w₃ ≈ 0.333) treat all three fairness criteria equally. Weights must sum to 1.

5. Disparate Impact Ratio (DIR) — Supplementary

DIR = PR_B / PR_A

Per the EEOC 4/5ths (80%) rule: a DIR below 0.80 indicates potential adverse impact against Group B. DIR is not included in BDS but is reported as a supplementary indicator.

Severity Thresholds (BDS)

BDS Range Classification
0.00 – 0.049Minimal / No Detectable Bias
0.05 – 0.099Low Bias
0.10 – 0.199Moderate Bias
0.20 – 0.349High Bias
≥ 0.35Severe Bias

Assumptions & References

  • Group A is treated as the reference group (e.g., majority or privileged group); Group B is the comparison group (e.g., minority or protected group). The choice of reference group affects DIR but not BDS.
  • All rate inputs (TPR, FPR, PR, CAL) must lie in [0, 1]. TPR ≥ FPR is expected for any model performing above random chance.
  • Weights w₁, w₂, w₃ must be non-negative and sum to exactly 1.0. Adjust weights to prioritize specific fairness criteria based on domain context.
  • BDS assumes equal base rates across groups. If base rates differ substantially, calibration difference may reflect real-world prevalence rather than model bias — interpret CD with caution in such cases.
  • This calculator evaluates group fairness (statistical parity), not individual fairness. These two notions can be in tension (Chouldechova, 2017).
  • Hardt, M., Price, E., & Srebro, N. (2016). Equality of Opportunity in Supervised Learning. NeurIPS 2016.
  • Chouldechova, A. (2017). Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments. Big Data, 5(2), 153–163.
  • EEOC Uniform Guidelines (1978). 4/5ths (80%) rule for adverse impact — Disparate Impact Ratio threshold of 0.80.
  • Barocas, S., Hardt, M., & Narayanan, A. (2023). Fairness and Machine Learning: Limitations and Opportunities. MIT Press.
  • BDS severity thresholds are adapted from common fairness audit practice; thresholds may vary by regulatory context, domain risk level, and organizational policy.

In the network