Discernibility#

class Discernibility#

Bases: object

Discernibility Metric (DM).

The Discernibility Metric measures the degree of ambiguity of the data. It assigns a penalty to each record based on the size of the equivalence class it belongs to. Smaller equivalence classes sizes result in a lower (better) score, and suppressed records are penalized based on the total size of the data.

\[DM = \sum^{all\_EQs} |EQ|^2 + |S| * |D|\]

where \(|EQ|\) is the size of an equivalence class, \(|S|\) is the number of suppressed records, and \(|D|\) is the size of data.

Methods

calculate

Calculate the discernibility from the data.

calculate_best_effort

Calculate the best-effort discernibility based on k.

calculate_from_equivalence_classes

Calculate the discernibility from equivalence classes.

static calculate(data: DataFrame | ndarray, qids_idx: list, suppression_counts: int = 0)#

Calculate the discernibility from the data.

Parameters:
  • data (DataFrame or ndarray) – The data to inspect.

  • qids_idx (list) – The column indices of the QID attributes.

  • suppression_counts (int, default 0) – The number of suppressed records.

Returns:

float – The calculated discernibility.

static calculate_best_effort(org_data: DataFrame, k: int = 1)#

Calculate the best-effort discernibility based on k.

When data size (\(|D|\)) is divisible by k, the best discernibility (DM) is equal to \(\frac{|D|}{k}*k^2\).

Otherwise, let \(R = |D|\:mod\:k\) be the number of remainder records. The best DM happens when each remainder record is grouped into one different equivalence class (EQ). This results in \(int(\frac{|D|}{k}) - R\) EQs of size \(k\), and \(R\) EQs of size \(k + 1\).

Parameters:
  • data (DataFrame or ndarray) – The data to inspect.

  • k (int, default 1) – The privacy parameter k.

Returns:

float – The calculated best-effort discernibility.

static calculate_from_equivalence_classes(equivalence_classes: list, suppression_counts: int = 0)#

Calculate the discernibility from equivalence classes.

Parameters:
  • equivalence_classes (list[{qid, count}]) – A list of dictionaries, where each dictionary contains a ‘count’ key representing the size of an equivalence class.

  • suppression_counts (int, default 0) – The number of suppressed records.

Returns:

float – The calculated discernibility.