MLutils module

class MLutils.Classifier(n_jobs=- 1)

Bases: MLutils.Model

compute_metrics_and_graphs(pred, actual, output_path='outputs/plots/mlflow_artifacts')

Evaluate model’s performance

Parameters
  • pred (list) – List of predictions

  • actual (list) – List of labels

  • output_path (str, optional) – Path indicating where to store the plots. Defaults to “outputs/plots/mlflow_artifacts”.

fit(x, y)

Fit a classifier using a cross-validated grid search.

Parameters
  • x (DataFrame) – Features

  • y (DataFrame) – Target

class MLutils.Model

Bases: object

Parent class of all model objects. 2 classes inherit: Regressor and Classfier classes

static getModel(task, n_jobs=1)
class MLutils.Preprocess(scaler=None, numeric_na_fill_method=None, category_na_fill_method=None, one_hot_encoding=True)

Bases: object

Preprocessing data (scaling, imputation, one hot encoding)

Parameters
  • scaler (str, optional) – Type of scaling to use.

  • numeric_na_fill_method (str, optional) – Imputation method for numerical variables. Defaults to None.

  • category_na_fill_method (srr, optional) – Imputation method for categorical variables. Defaults to None.

  • one_hot_encoding (bool, optional) – Where to encode non-numeric categorical variables. Defaults to True.

fit(df)
fit_transform(df)
transform(df, verbose=True)
class MLutils.Regressor(n_jobs=- 1)

Bases: MLutils.Model

compute_metrics_and_graphs(pred, actual, output_path='outputs/plots/mlflow_artifacts')

Evaluate model’s performance

Parameters
  • pred (list) – List of predictions

  • actual (list) – List of labels

  • output_path (str, optional) – Path indicating where to store the plots. Defaults to “outputs/plots/mlflow_artifacts”.

fit(x, y)
MLutils.explain(x, model, task, path='outputs/plots/mlflow_artifacts/shap', n_features=5)

explain a model’ decisions based on SHAP value approximation. SHAP algorithm is quadratic with the depth of trees. -> Be careful not to go over 12 for max_depth.

Parameters
  • x (DataFrame) – Input data

  • model ([type]) – Model to explain

  • task (str) – Task to perform. Available: regression, classification.

  • path (str, optional) – [description]. Defaults to “outputs/plots/mlflow_artifacts/shap”.

  • n_features (int, optional) – Number of most important features for which to generate partial dependance plot.