Discernibility#
- class Discernibility#
Bases:
objectDiscernibility Metric (DM).
The Discernibility Metric measures the degree of ambiguity of the data. It assigns a penalty to each record based on the size of the equivalence class it belongs to. Smaller equivalence classes sizes result in a lower (better) score, and suppressed records are penalized based on the total size of the data.
\[DM = \sum^{all\_EQs} |EQ|^2 + |S| * |D|\]where \(|EQ|\) is the size of an equivalence class, \(|S|\) is the number of suppressed records, and \(|D|\) is the size of data.
Methods
Calculate the discernibility from the data.
Calculate the best-effort discernibility based on k.
Calculate the discernibility from equivalence classes.
- static calculate(data: DataFrame | ndarray, qids_idx: list, suppression_counts: int = 0)#
Calculate the discernibility from the data.
- Parameters:
data (DataFrame or ndarray) – The data to inspect.
qids_idx (list) – The column indices of the QID attributes.
suppression_counts (int, default 0) – The number of suppressed records.
- Returns:
float – The calculated discernibility.
- static calculate_best_effort(org_data: DataFrame, k: int = 1)#
Calculate the best-effort discernibility based on k.
When data size (\(|D|\)) is divisible by k, the best discernibility (DM) is equal to \(\frac{|D|}{k}*k^2\).
Otherwise, let \(R = |D|\:mod\:k\) be the number of remainder records. The best DM happens when each remainder record is grouped into one different equivalence class (EQ). This results in \(int(\frac{|D|}{k}) - R\) EQs of size \(k\), and \(R\) EQs of size \(k + 1\).
- Parameters:
data (DataFrame or ndarray) – The data to inspect.
k (int, default 1) – The privacy parameter k.
- Returns:
float – The calculated best-effort discernibility.
- static calculate_from_equivalence_classes(equivalence_classes: list, suppression_counts: int = 0)#
Calculate the discernibility from equivalence classes.
- Parameters:
equivalence_classes (list[{qid, count}]) – A list of dictionaries, where each dictionary contains a ‘count’ key representing the size of an equivalence class.
suppression_counts (int, default 0) – The number of suppressed records.
- Returns:
float – The calculated discernibility.
See also
k_anonymization.evaluation.anonymity.get_equivalence_classesGet all equivalence classes.