NCP#
- class NCP#
Bases:
objectNormalized Certainty Penalty (NCP).
NCP applies value-wise penalty on each value of QID attribute and normalizes it to [0, 1] by the data size \(|D|\) and the number of QIDs \(|Q|\). A lower NCP indicates a lower information loss.
\[NCP = \frac{1}{|D|} * \sum^{|D|}\frac{\sum P_{num}(v_{num}) + \sum P_{cat}(v_{cat})}{|Q|}\]where \(P_{num}(v_{num})\) is the penalty for a value of a numerical QID attribute, \(P_{cat}(v_{cat})\) is the penalty for a value of a categorical QID attribute. Depending on the anonymization method, \(P_{num}\) and \(P_{cat}\) are calculated diferrently.
Methods
Calculate NCP for generalization anonymization.
Calculate NCP for local recoding algorithm with mean-mode group anonymization.
Calculate NCP for local recoding algorithm with summarization group anonymization.
- static calculate_for_generalization(org_data: DataFrame, anon_data: DataFrame, hierarchies: HierarchiesDict, qids_idx: list, is_categorical: list)#
Calculate NCP for generalization anonymization.
When a numerical value is generalized to a (local) numerical range, it becomes ambiguous in such a range. Thus, \(P_{num} = \frac{local\_range}{global\_range}\).
For categorical value, it becomes ambiguous among the leaves under the common ancestor for its equivalence class. Thus \(P_{cat} = \frac{leaves\_under\_common\_ancestor}{all\_leaves}\).
- Parameters:
org_data (DataFrame) – The original data.
anon_data (DataFrame) – The anonymized data.
hierarchies (HierarchiesDict) – Hierarchy definitions for the QID attributes.
qids_idx (list) – The column indices of the QID attributes.
is_categorical (list) – A list of booleans indicating if a QID attribute is categorical.
- Returns:
float – The NCP score.
- static calculate_for_local_recoding_mean_mode(org_data: DataFrame, groups: list, qids_idx: list, is_categorical: list)#
Calculate NCP for local recoding algorithm with mean-mode group anonymization.
When a numerical value is generalized to a (local) numerical range, it becomes ambiguous in such a range. Thus, \(P_{num} = \frac{local\_range}{global\_range}\).
For categorical value, if an original value is different from the mode of its equivalence class, the entire value is loss. Thus \(P_{cat}(v_{cat}) = 1\:if\:v_{cat} \neq mode, 0\:otherwise\).
- Parameters:
org_data (DataFrame) – The original data.
groups (list) – The anonymized data.
qids_idx (list) – The column indices of the QID attributes.
is_categorical (list) – A list of booleans indicating if a QID attribute is categorical.
- Returns:
float – The NCP score.
See also
k_anonymization.algorithms.local_recoding.GroupAnonymizationBuiltIn.MEAN_MODEAnonymize a group by mean and mode.
- static calculate_for_local_recoding_summarization(org_data: DataFrame, groups: list, qids_idx: list, is_categorical: list)#
Calculate NCP for local recoding algorithm with summarization group anonymization.
When a numerical value is generalized to a (local) numerical range, it becomes ambiguous in such a range. Thus, \(P_{num} = \frac{local\_range}{global\_range}\).
For categorical value, it becomes ambiguous among the unique values of its QID attribute in its equivalence class, denoted as \(Q^{EQ}(v_{cat})\). Thus \(P_{cat}(v_{cat}) = \frac{1}{|Q^{EQ}(v_{cat}).unique|}\).
- Parameters:
org_data (DataFrame) – The original data.
groups (list) – The anonymized data.
qids_idx (list) – The column indices of the QID attributes.
is_categorical (list) – A list of booleans indicating if a QID attribute is categorical.
- Returns:
float – The NCP score.
See also
k_anonymization.algorithms.local_recoding.GroupAnonymizationBuiltIn.SUMMARIZATIONAnonymize a group by creating a summary range or set.