Discernibility

`Discernibility`#

class Discernibility#

Bases: object

Discernibility Metric (DM).

The Discernibility Metric measures the degree of ambiguity of the data. It assigns a penalty to each record based on the size of the equivalence class it belongs to. Smaller equivalence classes sizes result in a lower (better) score, and suppressed records are penalized based on the total size of the data.

\[DM = \sum^{all\_EQs} |EQ|^2 + |S| * |D|\]

where \(|EQ|\) is the size of an equivalence class, \(|S|\) is the number of suppressed records, and \(|D|\) is the size of data.

Methods

`calculate`	Calculate the discernibility from the data.
`calculate_best_effort`	Calculate the best-effort discernibility based on k.
`calculate_from_equivalence_classes`	Calculate the discernibility from equivalence classes.

static calculate(data: DataFrame | ndarray, qids_idx: list, suppression_counts: int = 0)#

Calculate the discernibility from the data.

Parameters:

data (DataFrame or ndarray) – The data to inspect.
qids_idx (list) – The column indices of the QID attributes.
suppression_counts (int, default 0) – The number of suppressed records.

Returns:

float – The calculated discernibility.

static calculate_best_effort(org_data: DataFrame, k: int = 1)#

Calculate the best-effort discernibility based on k.

When data size (\(|D|\)) is divisible by k, the best discernibility (DM) is equal to \(\frac{|D|}{k}*k^2\).

Otherwise, let \(R = |D|\:mod\:k\) be the number of remainder records. The best DM happens when each remainder record is grouped into one different equivalence class (EQ). This results in \(int(\frac{|D|}{k}) - R\) EQs of size \(k\), and \(R\) EQs of size \(k + 1\).

Parameters:

data (DataFrame or ndarray) – The data to inspect.
k (int, default 1) – The privacy parameter k.

Returns:

float – The calculated best-effort discernibility.

static calculate_from_equivalence_classes(equivalence_classes: list, suppression_counts: int = 0)#

Calculate the discernibility from equivalence classes.

Parameters:

equivalence_classes (list[{qid, count}]) – A list of dictionaries, where each dictionary contains a ‘count’ key representing the size of an equivalence class.
suppression_counts (int, default 0) – The number of suppressed records.

Returns:

float – The calculated discernibility.

Discernibility

Contents

Discernibility#

`Discernibility`#