Perturbation#
- class Perturbation(dataset: Dataset, k: int, seed: int = None)#
Bases:
AlgorithmImplementation of Perturbation algorithm.
Perturbation uses a Differential Privacy-inspired technique that adds controlled noise into the dataset. It uses “Retention-Replacement” for categorical attributes and “Laplacian Noise” for numerical attributes.
- Parameters:
dataset (Dataset) – The Dataset object holding the original data and its metadata.
k (int) – The privacy parameter k.
seed (int, optional) – Random seed for reproducibility.
Methods
Run the Perturbation algorithm.
Apply Laplacian noise to numerical attributes.
Apply Retention-Replacement perturbation to categorical attributes.
Calculate the scale parameter b for Laplacian noise.
Solve for the retention parameter p using the bisection method.
- anonymize()#
Run the Perturbation algorithm.
Applies categorical perturbation followed by numerical perturbation, then reconstructs the finalized anonymized data object.
- do_laplacian_noise()#
Apply Laplacian noise to numerical attributes.
Adds random noise sampled from a Laplace distribution centered at zero. The resulting values are truncated to ensure they stay within the original attribute’s min/max range.
- do_retention_replacement()#
Apply Retention-Replacement perturbation to categorical attributes.
For each value, there is a probability that the original value is retained (\(p + \frac{1-p}{size}\)) and a probability that it is replaced by another value from the domain (\(\frac{1-p}{size}\)).
Notes
A temporary suffix #ReRe# is used during processing to distinguish between original and perturbed values to prevent recursive perturbation within the same column loop.
- solve_b_given_k()#
Calculate the scale parameter b for Laplacian noise.
This parameter determines the ‘width’ of the noise distribution needed to obscure numerical values sufficiently to reach the target privacy level.
- Returns:
float – The scale parameter b (sigma) for the Laplace distribution.
- solve_p_given_k(acceptance_error=1e-06)#
Solve for the retention parameter p using the bisection method.
In the Retention-Replacement model, k is a function of p. Because k decreases monotonically as p increases in the range [0, 1], this method iteratively narrows down the p required to reach the target k.
- Parameters:
acceptance_error (float, default 1e-6) – The tolerance level for the difference between the calculated k and the target k.
- Returns:
float – The optimal parameter p for categorical perturbation.