Hierarchy#
- class Hierarchy(name: str, hierarchy_df: DataFrame)#
Bases:
objectAttribute’s generalization hierarchy.
This class stores the generalization mapping of a QID attribute’s generalization hierarchy and provides utility functions for acquiring necessary properties of the hierarchy and its nodes.
- Parameters:
name (str) – The identifier for the hierarchy (usually the column name).
hierarchy_df (pd.DataFrame) – A DataFrame where column 0 is the raw data and subsequent columns are progressively more generalized levels.
Methods
Check whether a node exists anywhere in the hierarchy.
Initialize from a CSV file.
Initialize from a JSON configuration file.
Get the generalization level for a specific node.
Get all leaves (values at level 0) under the given node.
Find the lowest common ancestor (LCA) of the given nodes.
- contains(node_value: any)#
Check whether a node exists anywhere in the hierarchy.
- Parameters:
node_value (any) – The value of the node to inspect.
- Returns:
bool
- classmethod from_csv(name: str, path: str, sep: str = ',')#
Initialize from a CSV file.
The CSV should have no header, where the first column is the raw data and each following column is a higher level of generalization.
- Parameters:
name (str) – Hierarchy name.
path (str) – The path of the csv file.
sep (str, default ',') – Separator string (delimiter) of the csv file.
- classmethod from_json(name: str, org_column: DataFrame, json_path: str)#
Initialize from a JSON configuration file.
Supports two types of definition for generalization:
lambda: Apply lambda functions to derive the next generalization level based on the current value.tree: Explicitly map a list oforiginalvalues to ageneralizedvalue.
- Parameters:
name (str) – Hierarchy name.
org_column (pd.DataFrame) – The column from the original dataset to use as Level 0.
json_path (str) – Path to the JSON configuration file.
- get_height_of_node(node_value: any)#
Get the generalization level for a specific node.
- Parameters:
node_value (any) – The value of the node to inspect.
- Returns:
int – Generalization level of the input node.
- get_leaves_under_node(node_value: any)#
Get all leaves (values at level 0) under the given node.
- Parameters:
node_value (any) – The value of the node to inspect.
- Returns:
list – A list of leaves.
- get_lowest_common_ancestor(node_values: list, get_type: Literal['value', 'height'] = 'value')#
Find the lowest common ancestor (LCA) of the given nodes.
This is used to find the lowest-level generalized value that can hide a group of different values.
- Parameters:
node_values (list) – The list of values to inspect.
get_type ({'value', 'height'}, default 'value') – Whether to return the LCA value or its generalization level.
- Returns:
str or int – The LCA value or its height (generalization level).
Attributes
Height of this hierarchy.
The underlying hierarchy mapping DataFrame.
List of leaves (values at level 0).
Name of this hierarchy.
- height#
Height of this hierarchy.
- Returns:
int
- hierarchy_df#
The underlying hierarchy mapping DataFrame.
- Returns:
ITableDF
- leaves#
List of leaves (values at level 0).
- Returns:
list
- name#
Name of this hierarchy.
- Returns:
str