MLClassificationPerformance#
- class MLClassificationPerformance(model, df: DataFrame, feature_names: list, target_name: str, split_ratio: float = 0.2, test_df: DataFrame = None, seed: int = None)#
Bases:
objectEvaluation based on machine learning classification performance.
- Parameters:
model – A machine learning classifier (e.g., RandomForest, SVC).
df (DataFrame) – The data to be evaluated.
feature_names (list) – A list of column names to be used as features (X).
target_name (str) – The column name of the prediction target (y).
split_ratio (float) – The fraction of the data to be used for testing (default 0.2).
test_df (DataFrame) – An optional separate data to be used for testing. If provided,
split_ratiois ignored and the wholedfis used for training.seed (int) – Random seed for reproducibility in data splitting and model training.
- Variables:
X_train (DataFrame) – The processed training features.
X_test (DataFrame) – The processed testing features.
y_train (array-like) – The encoded training target labels.
y_test (array-like) – The encoded testing target labels.
metrics (dict) – A dictionary containing averaged accuracy, precision, recall, and F1.
raw_metrics (dict) – A dictionary containing raw accuracy, precision, recall, and F1.
classification_report (str) – The text report on the classification results
confusion_matrix (ndarray) – The matrix showing true vs. predicted classifications.
See also
MLClassifierExampleSet of example machine learning classifiers.
Methods
Calculate statistical performance metrics.
Train the model and generate predictions on the test set.
Preprorocess the provided external test data.
Execute the model evaluation workflow.
Change and preprocess the input data.
- _compute_metrics(y_test, y_pred, preview)#
Calculate statistical performance metrics.
- Parameters:
y_test – Ground truth labels.
y_pred – Predicted labels.
preview – If True, prints a classification report.
- Returns:
tuple – A tuple containing raw metrics (per class), averaged metrics, the text report, and the confusion matrix.
- _predict()#
Train the model and generate predictions on the test set.
- _set_X_y_test_from_test_df()#
Preprorocess the provided external test data.
- evaluate(preview=False, restart=False)#
Execute the model evaluation workflow.
Fits the model (if not already trained), predicts, and stores classification results.
- Parameters:
preview (bool, default False) – Whether to print the classification report.
restart (bool, default False) – If True, forces the model to re-train and re-predict.
- update_df(df: DataFrame)#
Change and preprocess the input data.
This method performs the following:
Categorical QIDs are One-Hot Encoded.
Numerical QIDs are preserved.
The target variable is Label Encoded.
Data is split into training and testing sets if a
test_dfis not presented.
- Parameters:
df (DataFrame) – The data to be evaluated.