Hierarchy#

class Hierarchy(name: str, hierarchy_df: DataFrame)#

Bases: object

Attribute’s generalization hierarchy.

This class stores the generalization mapping of a QID attribute’s generalization hierarchy and provides utility functions for acquiring necessary properties of the hierarchy and its nodes.

Parameters:
  • name (str) – The identifier for the hierarchy (usually the column name).

  • hierarchy_df (pd.DataFrame) – A DataFrame where column 0 is the raw data and subsequent columns are progressively more generalized levels.

Methods

contains

Check whether a node exists anywhere in the hierarchy.

from_csv

Initialize from a CSV file.

from_json

Initialize from a JSON configuration file.

get_height_of_node

Get the generalization level for a specific node.

get_leaves_under_node

Get all leaves (values at level 0) under the given node.

get_lowest_common_ancestor

Find the lowest common ancestor (LCA) of the given nodes.

contains(node_value: any)#

Check whether a node exists anywhere in the hierarchy.

Parameters:

node_value (any) – The value of the node to inspect.

Returns:

bool

classmethod from_csv(name: str, path: str, sep: str = ',')#

Initialize from a CSV file.

The CSV should have no header, where the first column is the raw data and each following column is a higher level of generalization.

Parameters:
  • name (str) – Hierarchy name.

  • path (str) – The path of the csv file.

  • sep (str, default ',') – Separator string (delimiter) of the csv file.

classmethod from_json(name: str, org_column: DataFrame, json_path: str)#

Initialize from a JSON configuration file.

Supports two types of definition for generalization:

  1. lambda: Apply lambda functions to derive the next generalization level based on the current value.

  2. tree: Explicitly map a list of original values to a generalized value.

Parameters:
  • name (str) – Hierarchy name.

  • org_column (pd.DataFrame) – The column from the original dataset to use as Level 0.

  • json_path (str) – Path to the JSON configuration file.

get_height_of_node(node_value: any)#

Get the generalization level for a specific node.

Parameters:

node_value (any) – The value of the node to inspect.

Returns:

int – Generalization level of the input node.

get_leaves_under_node(node_value: any)#

Get all leaves (values at level 0) under the given node.

Parameters:

node_value (any) – The value of the node to inspect.

Returns:

list – A list of leaves.

get_lowest_common_ancestor(node_values: list, get_type: Literal['value', 'height'] = 'value')#

Find the lowest common ancestor (LCA) of the given nodes.

This is used to find the lowest-level generalized value that can hide a group of different values.

Parameters:
  • node_values (list) – The list of values to inspect.

  • get_type ({'value', 'height'}, default 'value') – Whether to return the LCA value or its generalization level.

Returns:

str or int – The LCA value or its height (generalization level).

Attributes

height

Height of this hierarchy.

hierarchy_df

The underlying hierarchy mapping DataFrame.

leaves

List of leaves (values at level 0).

name

Name of this hierarchy.

height#

Height of this hierarchy.

Returns:

int

hierarchy_df#

The underlying hierarchy mapping DataFrame.

Returns:

ITableDF

leaves#

List of leaves (values at level 0).

Returns:

list

name#

Name of this hierarchy.

Returns:

str