Perturbation

`Perturbation`#

class Perturbation(dataset: Dataset, k: int, seed: int = None)#

Bases: Algorithm

Implementation of Perturbation algorithm.

Perturbation uses a Differential Privacy-inspired technique that adds controlled noise into the dataset. It uses “Retention-Replacement” for categorical attributes and “Laplacian Noise” for numerical attributes.

Parameters:

dataset (Dataset) – The Dataset object holding the original data and its metadata.
k (int) – The privacy parameter k.
seed (int, optional) – Random seed for reproducibility.

Methods

`anonymize`	Run the Perturbation algorithm.
`do_laplacian_noise`	Apply Laplacian noise to numerical attributes.
`do_retention_replacement`	Apply Retention-Replacement perturbation to categorical attributes.
`solve_b_given_k`	Calculate the scale parameter b for Laplacian noise.
`solve_p_given_k`	Solve for the retention parameter p using the bisection method.

anonymize()#

Run the Perturbation algorithm.

Applies categorical perturbation followed by numerical perturbation, then reconstructs the finalized anonymized data object.

do_laplacian_noise()#

Apply Laplacian noise to numerical attributes.

Adds random noise sampled from a Laplace distribution centered at zero. The resulting values are truncated to ensure they stay within the original attribute’s min/max range.

do_retention_replacement()#

Apply Retention-Replacement perturbation to categorical attributes.

For each value, there is a probability that the original value is retained (\(p + \frac{1-p}{size}\)) and a probability that it is replaced by another value from the domain (\(\frac{1-p}{size}\)).

Notes

A temporary suffix #ReRe# is used during processing to distinguish between original and perturbed values to prevent recursive perturbation within the same column loop.

solve_b_given_k()#

Calculate the scale parameter b for Laplacian noise.

This parameter determines the ‘width’ of the noise distribution needed to obscure numerical values sufficiently to reach the target privacy level.

Returns:: float – The scale parameter b (sigma) for the Laplace distribution.

solve_p_given_k(acceptance_error=1e-06)#

Solve for the retention parameter p using the bisection method.

In the Retention-Replacement model, k is a function of p. Because k decreases monotonically as p increases in the range [0, 1], this method iteratively narrows down the p required to reach the target k.

Parameters:: acceptance_error (float, default 1e-6) – The tolerance level for the difference between the calculated k and the target k.
Returns:: float – The optimal parameter p for categorical perturbation.

Perturbation

Contents

Perturbation#

`Perturbation`#