MLClassificationPerformance#

class MLClassificationPerformance(model, df: DataFrame, feature_names: list, target_name: str, split_ratio: float = 0.2, test_df: DataFrame = None, seed: int = None)#

Bases: object

Evaluation based on machine learning classification performance.

Parameters:
  • model – A machine learning classifier (e.g., RandomForest, SVC).

  • df (DataFrame) – The data to be evaluated.

  • feature_names (list) – A list of column names to be used as features (X).

  • target_name (str) – The column name of the prediction target (y).

  • split_ratio (float) – The fraction of the data to be used for testing (default 0.2).

  • test_df (DataFrame) – An optional separate data to be used for testing. If provided, split_ratio is ignored and the whole df is used for training.

  • seed (int) – Random seed for reproducibility in data splitting and model training.

Variables:
  • X_train (DataFrame) – The processed training features.

  • X_test (DataFrame) – The processed testing features.

  • y_train (array-like) – The encoded training target labels.

  • y_test (array-like) – The encoded testing target labels.

  • metrics (dict) – A dictionary containing averaged accuracy, precision, recall, and F1.

  • raw_metrics (dict) – A dictionary containing raw accuracy, precision, recall, and F1.

  • classification_report (str) – The text report on the classification results

  • confusion_matrix (ndarray) – The matrix showing true vs. predicted classifications.

See also

MLClassifierExample

Set of example machine learning classifiers.

Methods

_compute_metrics

Calculate statistical performance metrics.

_predict

Train the model and generate predictions on the test set.

_set_X_y_test_from_test_df

Preprorocess the provided external test data.

evaluate

Execute the model evaluation workflow.

update_df

Change and preprocess the input data.

_compute_metrics(y_test, y_pred, preview)#

Calculate statistical performance metrics.

Parameters:
  • y_test – Ground truth labels.

  • y_pred – Predicted labels.

  • preview – If True, prints a classification report.

Returns:

tuple – A tuple containing raw metrics (per class), averaged metrics, the text report, and the confusion matrix.

_predict()#

Train the model and generate predictions on the test set.

_set_X_y_test_from_test_df()#

Preprorocess the provided external test data.

evaluate(preview=False, restart=False)#

Execute the model evaluation workflow.

Fits the model (if not already trained), predicts, and stores classification results.

Parameters:
  • preview (bool, default False) – Whether to print the classification report.

  • restart (bool, default False) – If True, forces the model to re-train and re-predict.

update_df(df: DataFrame)#

Change and preprocess the input data.

This method performs the following:

  1. Categorical QIDs are One-Hot Encoded.

  2. Numerical QIDs are preserved.

  3. The target variable is Label Encoded.

  4. Data is split into training and testing sets if a test_df is not presented.

Parameters:

df (DataFrame) – The data to be evaluated.