NCP

`NCP`#

class NCP#

Bases: object

Normalized Certainty Penalty (NCP).

NCP applies value-wise penalty on each value of QID attribute and normalizes it to [0, 1] by the data size \(|D|\) and the number of QIDs \(|Q|\). A lower NCP indicates a lower information loss.

\[NCP = \frac{1}{|D|} * \sum^{|D|}\frac{\sum P_{num}(v_{num}) + \sum P_{cat}(v_{cat})}{|Q|}\]

where \(P_{num}(v_{num})\) is the penalty for a value of a numerical QID attribute, \(P_{cat}(v_{cat})\) is the penalty for a value of a categorical QID attribute. Depending on the anonymization method, \(P_{num}\) and \(P_{cat}\) are calculated diferrently.

Methods

`calculate_for_generalization`	Calculate NCP for generalization anonymization.
`calculate_for_local_recoding_mean_mode`	Calculate NCP for local recoding algorithm with mean-mode group anonymization.
`calculate_for_local_recoding_summarization`	Calculate NCP for local recoding algorithm with summarization group anonymization.

static calculate_for_generalization(org_data: DataFrame, anon_data: DataFrame, hierarchies: HierarchiesDict, qids_idx: list, is_categorical: list)#

Calculate NCP for generalization anonymization.

When a numerical value is generalized to a (local) numerical range, it becomes ambiguous in such a range. Thus, \(P_{num} = \frac{local\_range}{global\_range}\).

For categorical value, it becomes ambiguous among the leaves under the common ancestor for its equivalence class. Thus \(P_{cat} = \frac{leaves\_under\_common\_ancestor}{all\_leaves}\).

Parameters:

org_data (DataFrame) – The original data.
anon_data (DataFrame) – The anonymized data.
hierarchies (HierarchiesDict) – Hierarchy definitions for the QID attributes.
qids_idx (list) – The column indices of the QID attributes.
is_categorical (list) – A list of booleans indicating if a QID attribute is categorical.

Returns:

float – The NCP score.

static calculate_for_local_recoding_mean_mode(org_data: DataFrame, groups: list, qids_idx: list, is_categorical: list)#

Calculate NCP for local recoding algorithm with mean-mode group anonymization.

When a numerical value is generalized to a (local) numerical range, it becomes ambiguous in such a range. Thus, \(P_{num} = \frac{local\_range}{global\_range}\).

For categorical value, if an original value is different from the mode of its equivalence class, the entire value is loss. Thus \(P_{cat}(v_{cat}) = 1\:if\:v_{cat} \neq mode, 0\:otherwise\).

Parameters:

org_data (DataFrame) – The original data.
groups (list) – The anonymized data.
qids_idx (list) – The column indices of the QID attributes.
is_categorical (list) – A list of booleans indicating if a QID attribute is categorical.

Returns:

float – The NCP score.

NCP

Contents

NCP#

`NCP`#