CAVG#

class CAVG#

Bases: object

(Normalized) Average Equivalence Class Size (\(C_{AVG}\)).

\(C_{AVG}\) estimates the trade-off between information loss and privacy protection based on the average size of the equivalence classes. A value closer to 1 indicates a more balanced trade-off. \(C_{AVG}\) believes that a minimal-loss k-anonymization algorithm is one that results in equivalence classes all of size k. In other words, an equivalence class of size > k express an over-anonymization that led to unnecessary information loss.

\[C_{AVG} = \frac{|D|}{|EQs| * k}\]

where \(|D|\) is the size of data and \(|EQs|\) is the number of equivalence classes.

Methods

calculate

Calculate CAVG score from the data.

calculate_best_effort

Calculate the best-effort CAVG.

calculate_from_equivalence_classes

Calculate CAVG from equivalence classes.

static calculate(data: DataFrame | ndarray, qids_idx: list, k: int)#

Calculate CAVG score from the data.

Parameters:
  • data (DataFrame or ndarray) – The data to inspect.

  • qids_idx (list) – The column indices of the QID attributes.

  • k (int) – The privacy parameter k.

Returns:

float – The calculated CAVG.

static calculate_best_effort(org_data: DataFrame, k: int = 1)#

Calculate the best-effort CAVG.

The best CAVG happens when data records are evenly distributed into \(int(\frac{|D|}{k})\) equivalence classes, i.e., when the data has exactly \(int(\frac{|D|}{k})\) equivalence classes.

\[C_{AVG}\_best = \frac{|D|}{int(\frac{|D|}{k}) * k}\]

where \(|D|\) is the size of data.

Parameters:
  • org_data (DataFrame) – The original data.

  • k (int, default 1) – The privacy parameter k.

Returns:

float – The calculated best-effort CAVG.

static calculate_from_equivalence_classes(equivalence_classes: list, k: int)#

Calculate CAVG from equivalence classes.

Parameters:
  • equivalence_classes (list[{qid, count}]) – A list of dictionaries, where each dictionary contains a ‘count’ key representing the size of an equivalence class.

  • k (int) – The privacy parameter k.

Returns:

float – The calculated CAVG.