fairlearn.metrics package

Functionality for computing metrics, with a particular focus on group metrics.

For our purpose, a metric is a function with signature f(y_true, y_pred, ....) where y_true are the set of true values and y_pred are values predicted by a machine learning algorithm. Other arguments may be present (most often sample weights), which will affect how the metric is calculated.

The group metrics in this module have signatures g(y_true, y_pred, group_membership, ...) where group_membership is an array of values indicating a group to which each pair of true and predicted values belong. The metric is evaluated for the entire set of data, and also for each subgroup identified in group_membership.

fairlearn.metrics.demographic_parity_difference(y_true, y_pred, *, sensitive_features, sample_weight=None)[source]

Calculate the demographic parity difference.

Parameters
  • y_true (1D-array) – Ground truth (correct) labels.

  • y_pred (1D-array) – Predicted labels \(h(X)\) returned by the classifier.

  • sensitive_features (1D-array) – Sensitive features.

  • sample_weight (1D-array) – Sample weights.

Returns

The difference between the largest and the smallest group-level selection rate, \(E[h(X) | A=a]\), across all values \(a\) of the sensitive feature. The demographic parity difference of 0 means that all groups have the same selection rate.

fairlearn.metrics.demographic_parity_ratio(y_true, y_pred, *, sensitive_features, sample_weight=None)[source]

Calculate the demographic parity ratio.

Parameters
  • y_true (1D-array) – Ground truth (correct) labels.

  • y_pred (1D-array) – Predicted labels \(h(X)\) returned by the classifier.

  • sensitive_features (1D-array) – Sensitive features.

  • sample_weight (1D-array) – Sample weights.

Returns

The ratio between the smallest and the largest group-level selection rate, \(E[h(X) | A=a]\), across all values \(a\) of the sensitive feature. The demographic parity ratio of 1 means that all groups have the same selection rate.

fairlearn.metrics.difference_from_summary(summary)[source]

Calculate the difference between the maximum and minimum metric value across groups.

Parameters

summary – A group metric summary

Returns

The difference between the maximum and the minimum group-level metrics described in summary.

Return type

float

fairlearn.metrics.equalized_odds_difference(y_true, y_pred, *, sensitive_features, sample_weight=None)[source]

Calculate the equalized odds difference.

Parameters
  • y_true (1D-array) – Ground truth (correct) labels \(Y\).

  • y_pred (1D-array) – Predicted labels \(h(X)\) returned by the classifier.

  • sensitive_features (1D-array) – Sensitive features.

  • sample_weight (1D-array) – Sample weights.

Returns

The greater of two metrics: true_positive_rate_difference and false_negative_rate_difference. The former is the difference between the largest and smallest of \(P[h(X)=1 | A=a, Y=1]\), across all values \(a\) of the sensitive feature. The latter is defined similarly, but for \(P[h(X)=1 | A=a, Y=0]\). The equalized odds difference of 0 means that all groups have the same true positive, true negative, false positive, and false negative rates.

fairlearn.metrics.equalized_odds_ratio(y_true, y_pred, *, sensitive_features, sample_weight=None)[source]

Calculate the equalized odds ratio.

Parameters
  • y_true (1D-array) – Ground truth (correct) labels \(Y\).

  • y_pred (1D-array) – Predicted labels \(h(X)\) returned by the classifier.

  • sensitive_features (1D-array) – Sensitive features.

  • sample_weight (1D-array) – Sample weights.

Returns

The smaller of two metrics: true_positive_rate_ratio and false_negative_rate_ratio. The former is the ratio between the smallest and largest of \(P[h(X)=1 | A=a, Y=1]\), across all values \(a\) of the sensitive feature. The latter is defined similarly, but for \(P[h(X)=1 | A=a, Y=0]\). The equalized odds ratio of 1 means that all groups have the same true positive, true negative, false positive, and false negative rates.

fairlearn.metrics.false_negative_rate(y_true, y_pred, sample_weight=None, pos_label=None)[source]

Calculate the false negative rate (also called miss rate).

Parameters
  • y_true (array-like) – The list of true values

  • y_pred (array-like) – The list of predicted values

  • sample_weight (array-like, optional) – A list of weights to apply to each sample. By default all samples are weighted equally

  • pos_label (scalar, optional) – The value to treat as the ‘positive’ label in the samples. If None (the default) then the largest unique value of the y arrays will be used.

Returns

The false negative rate for the data

Return type

float

fairlearn.metrics.false_positive_rate(y_true, y_pred, sample_weight=None, pos_label=None)[source]

Calculate the false positive rate (also called fall-out).

Parameters
  • y_true (array-like) – The list of true values

  • y_pred (array-like) – The list of predicted values

  • sample_weight (array-like, optional) – A list of weights to apply to each sample. By default all samples are weighted equally

  • pos_label (scalar, optional) – The value to treat as the ‘positive’ label in the samples. If None (the default) then the largest unique value of the y arrays will be used.

Returns

The false positive rate for the data

Return type

float

fairlearn.metrics.group_max_from_summary(summary)[source]

Retrieve the minimum group-level metric value from group summary.

Parameters

summary – A group metric summary

Returns

The maximum group-level metric value across all groups in summary.

Return type

float

fairlearn.metrics.group_min_from_summary(summary)[source]

Retrieve the minimum group-level metric value from group summary.

Parameters

summary – A group metric summary

Returns

The minimum group-level metric value across all groups in summary.

Return type

float

fairlearn.metrics.group_summary(metric_function, y_true, y_pred, *, sensitive_features, indexed_params=None, **metric_params)[source]

Apply a metric to each subgroup of a set of data.

Parameters
  • metric_function – Function with signature metric_function(y_true, y_pred, \*\*metric_params)

  • y_true – Array of ground-truth values

  • y_pred – Array of predicted values

  • sensitive_features – Array indicating the group to which each input value belongs

  • indexed_params – Names of metric_function parameters that should be split according to sensitive_features in addition to y_true and y_pred. Defaults to None corresponding to {"sample_weight"}.

  • **metric_params – Optional arguments to be passed to the metric_function

Returns

Object containing the result of applying metric_function to the entire dataset and to each group identified in sensitive_features

Return type

sklearn.utils.Bunch with the fields overall and by_group

fairlearn.metrics.make_derived_metric(transformation_function, summary_function, name=None)[source]

Make a callable that calculates a derived metric from the group summary.

Parameters
  • transformation_function (func) – A transformation function with the signature transformation_function(summary)

  • summary_function (func) – A metric group summary function with the signature summary_function(y_true, y_pred, *, sensitive_features, **metric_params)

Returns

A callable object with the signature derived_metric(y_true, y_pred, *, sensitive_features, **metric_params)

Return type

func

fairlearn.metrics.make_metric_group_summary(metric_function, indexed_params=None, name=None)[source]

Make a callable that calculates the group summary of a metric.

Parameters
  • metric_function (func) – A metric function with the signature metric_function(y_true, y_pred, **metric_params)

  • indexed_params – The names of parameters of metric_function that should be split according to sensitive_features in addition to y_true and y_pred. Defaults to None corresponding to ['sample_weight'].

Returns

A callable object with the signature metric_group_summary(y_true, y_pred, *, sensitive_features, **metric_params)

Return type

func

fairlearn.metrics.mean_prediction(y_true, y_pred, sample_weight=None)[source]

Calculate the (weighted) mean prediction.

The true values are ignored, but required as an argument in order to maintain a consistent interface

fairlearn.metrics.ratio_from_summary(summary)[source]

Calculate the ratio between the maximum and minimum metric value across groups.

Parameters

summary – A group metric summary

Returns

The ratio between the maximum and the minimum group-level metrics described in summary.

Return type

float

fairlearn.metrics.selection_rate(y_true, y_pred, *, pos_label=1, sample_weight=None)[source]

Calculate the fraction of predicted labels matching the ‘good’ outcome.

The argument pos_label specifies the ‘good’ outcome.

fairlearn.metrics.true_negative_rate(y_true, y_pred, sample_weight=None, pos_label=None)[source]

Calculate the true negative rate (also called specificity or selectivity).

Parameters
  • y_true (array-like) – The list of true values

  • y_pred (array-like) – The list of predicted values

  • sample_weight (array-like, optional) – A list of weights to apply to each sample. By default all samples are weighted equally

  • pos_label (scalar, optional) – The value to treat as the ‘positive’ label in the samples. If None (the default) then the largest unique value of the y arrays will be used.

Returns

The true negative rate for the data

Return type

float

fairlearn.metrics.true_positive_rate(y_true, y_pred, sample_weight=None, pos_label=None)[source]

Calculate the true positive rate (also called sensitivity, recall, or hit rate).

Parameters
  • y_true (array-like) – The list of true values

  • y_pred (array-like) – The list of predicted values

  • sample_weight (array-like, optional) – A list of weights to apply to each sample. By default all samples are weighted equally

  • pos_label (scalar, optional) – The value to treat as the ‘positive’ label in the samples. If None (the default) then the largest unique value of the y arrays will be used.

Returns

The true positive rate for the data

Return type

float