MLClassificationPerformance

`MLClassificationPerformance`#

class MLClassificationPerformance(model, df: DataFrame, feature_names: list, target_name: str, split_ratio: float = 0.2, test_df: DataFrame = None, seed: int = None)#

Bases: object

Evaluation based on machine learning classification performance.

Parameters:

model – A machine learning classifier (e.g., RandomForest, SVC).
df (DataFrame) – The data to be evaluated.
feature_names (list) – A list of column names to be used as features (X).
target_name (str) – The column name of the prediction target (y).
split_ratio (float) – The fraction of the data to be used for testing (default 0.2).
test_df (DataFrame) – An optional separate data to be used for testing. If provided, split_ratio is ignored and the whole df is used for training.
seed (int) – Random seed for reproducibility in data splitting and model training.

Variables:

X_train (DataFrame) – The processed training features.
X_test (DataFrame) – The processed testing features.
y_train (array-like) – The encoded training target labels.
y_test (array-like) – The encoded testing target labels.
metrics (dict) – A dictionary containing averaged accuracy, precision, recall, and F1.
raw_metrics (dict) – A dictionary containing raw accuracy, precision, recall, and F1.
classification_report (str) – The text report on the classification results
confusion_matrix (ndarray) – The matrix showing true vs. predicted classifications.

See also

MLClassifierExample: Set of example machine learning classifiers.

Methods

`_compute_metrics`	Calculate statistical performance metrics.
`_predict`	Train the model and generate predictions on the test set.
`_set_X_y_test_from_test_df`	Preprorocess the provided external test data.
`evaluate`	Execute the model evaluation workflow.
`update_df`	Change and preprocess the input data.

_compute_metrics(y_test, y_pred, preview)#

Calculate statistical performance metrics.

Parameters:

y_test – Ground truth labels.
y_pred – Predicted labels.
preview – If True, prints a classification report.

Returns:

tuple – A tuple containing raw metrics (per class), averaged metrics, the text report, and the confusion matrix.

_predict()#: Train the model and generate predictions on the test set.

_set_X_y_test_from_test_df()#: Preprorocess the provided external test data.

evaluate(preview=False, restart=False)#

Execute the model evaluation workflow.

Fits the model (if not already trained), predicts, and stores classification results.

Parameters:

preview (bool, default False) – Whether to print the classification report.
restart (bool, default False) – If True, forces the model to re-train and re-predict.

update_df(df: DataFrame)#

Change and preprocess the input data.

This method performs the following:

Categorical QIDs are One-Hot Encoded.
Numerical QIDs are preserved.
The target variable is Label Encoded.
Data is split into training and testing sets if a test_df is not presented.

Parameters:: df (DataFrame) – The data to be evaluated.

MLClassificationPerformance

Contents

MLClassificationPerformance#

`MLClassificationPerformance`#