Perturbation#

class Perturbation(dataset: Dataset, k: int, seed: int = None)#

Bases: Algorithm

Implementation of Perturbation algorithm.

Perturbation uses a Differential Privacy-inspired technique that adds controlled noise into the dataset. It uses “Retention-Replacement” for categorical attributes and “Laplacian Noise” for numerical attributes.

Parameters:
  • dataset (Dataset) – The Dataset object holding the original data and its metadata.

  • k (int) – The privacy parameter k.

  • seed (int, optional) – Random seed for reproducibility.

Methods

anonymize

Run the Perturbation algorithm.

do_laplacian_noise

Apply Laplacian noise to numerical attributes.

do_retention_replacement

Apply Retention-Replacement perturbation to categorical attributes.

solve_b_given_k

Calculate the scale parameter b for Laplacian noise.

solve_p_given_k

Solve for the retention parameter p using the bisection method.

anonymize()#

Run the Perturbation algorithm.

Applies categorical perturbation followed by numerical perturbation, then reconstructs the finalized anonymized data object.

do_laplacian_noise()#

Apply Laplacian noise to numerical attributes.

Adds random noise sampled from a Laplace distribution centered at zero. The resulting values are truncated to ensure they stay within the original attribute’s min/max range.

do_retention_replacement()#

Apply Retention-Replacement perturbation to categorical attributes.

For each value, there is a probability that the original value is retained (\(p + \frac{1-p}{size}\)) and a probability that it is replaced by another value from the domain (\(\frac{1-p}{size}\)).

Notes

A temporary suffix #ReRe# is used during processing to distinguish between original and perturbed values to prevent recursive perturbation within the same column loop.

solve_b_given_k()#

Calculate the scale parameter b for Laplacian noise.

This parameter determines the ‘width’ of the noise distribution needed to obscure numerical values sufficiently to reach the target privacy level.

Returns:

float – The scale parameter b (sigma) for the Laplace distribution.

solve_p_given_k(acceptance_error=1e-06)#

Solve for the retention parameter p using the bisection method.

In the Retention-Replacement model, k is a function of p. Because k decreases monotonically as p increases in the range [0, 1], this method iteratively narrows down the p required to reach the target k.

Parameters:

acceptance_error (float, default 1e-6) – The tolerance level for the difference between the calculated k and the target k.

Returns:

float – The optimal parameter p for categorical perturbation.