mlops.examples.image.classification package

Submodules

mlops.examples.image.classification.errors module

Contains custom errors for the Pokemon classification example.

exception mlops.examples.image.classification.errors.LabelsNotFoundError

Bases: FileNotFoundError

Raised when a PokemonClassificationDataProcessor attempts to load labels for prediction data, an unlabeled data source.

exception mlops.examples.image.classification.errors.NoModelPathsSuppliedError

Bases: ValueError

Raised when a non-empty collection of strings representing paths to models is expected, but an empty collection is passed instead.

mlops.examples.image.classification.model_prediction module

Loads a VersionedModel and uses it to run prediction on unseen data.

mlops.examples.image.classification.model_prediction.get_best_model(model_paths: Collection) mlops.model.versioned_model.VersionedModel

Returns the versioned model with the best performance on the validation dataset.

Parameters

model_paths – The paths to the versioned models to load.

Returns

The versioned model with the best performance on the validation dataset.

mlops.examples.image.classification.model_prediction.main() None

Runs the program.

mlops.examples.image.classification.model_prediction.model_evaluate(dataset: mlops.dataset.versioned_dataset.VersionedDataset, model: mlops.model.versioned_model.VersionedModel) float

Returns the model’s loss on the test dataset.

Parameters
  • dataset – The dataset.

  • model – The model.

Returns

The model’s loss on the test dataset.

mlops.examples.image.classification.model_prediction.model_predict(features: numpy.ndarray, dataset: mlops.dataset.versioned_dataset.VersionedDataset, model: mlops.model.versioned_model.VersionedModel) numpy.ndarray

Returns the model’s unpreprocessed predictions on the data located at the given path.

Parameters
  • features – The preprocessed features on which to run prediction.

  • dataset – The dataset.

  • model – The model.

Returns

The model’s unpreprocessed predictions on the data located at the given path.

mlops.examples.image.classification.pokemon_classification_data_processor module

Contains the PokemonClassificationDataProcessor class.

class mlops.examples.image.classification.pokemon_classification_data_processor.PokemonClassificationDataProcessor

Bases: mlops.dataset.invertible_data_processor.InvertibleDataProcessor

Transforms the pokemon dataset at sample_data/pokemon into features and labels for classification.

get_raw_features(dataset_path: str) Dict[str, numpy.ndarray]

Returns the raw feature tensors from the prediction dataset path. Raw features are tensors of shape m x h x w x c, where m is the number of images, h is the image height, w is the image width, and c is the number of channels (3 for RGB), with all values in the interval [0, 1]. The features are already scaled because PNG images load into float32 instead of uint8.

Parameters

dataset_path – The path to the file or directory on the local or remote filesystem containing the dataset.

Returns

A dictionary whose values are feature tensors and whose corresponding keys are the names by which those tensors should be referenced. The returned keys will be {‘X_train’, ‘X_val’, ‘X_test’} if the directory indicated by dataset_path ends with ‘trainvaltest’, and {‘X_pred’} otherwise.

get_raw_features_and_labels(dataset_path: str) Tuple[Dict[str, numpy.ndarray], Dict[str, numpy.ndarray]]

Returns the raw feature and label tensors from the dataset path. This method is specifically used for the train/val/test sets and not input data for prediction, because in some cases the features and labels need to be read simultaneously to ensure proper ordering of features and labels.

Raw features are tensors of shape m x h x w x c, where m is the number of images, h is the image height, w is the image width, and c is the number of channels (3 for RGB), with all values in the interval [0, 1]. Raw labels are tensors of shape m x 2, where m is the number of examples. All entries are strings from CLASSES indicating 1 or 2 (if multi-typed) types belonging to the sample. Types are not ordered.

Parameters

dataset_path – The path to the file or directory on the local or remote filesystem containing the dataset, specifically train/val/test and not prediction data.

Returns

A 2-tuple of the features dictionary and labels dictionary, with matching keys and ordered tensors.

static get_valid_prediction(pred_arr: numpy.ndarray, threshold: float = 0.5) numpy.ndarray

Returns a valid binary prediction from the raw prediction tensor. A valid prediction has one or two 1s, and all other entries are 0. The highest value in the prediction array is automatically converted to a 1, and the second-highest is converted to a 1 if the value is higher than the given decision threshold.

Parameters
  • pred_arr – The raw model predictions; a tensor of shape m x k, where m is the number of examples and k is the number of classes. All entries are in the range [0, 1].

  • threshold – The decision threshold, in the range [0, 1]. If the second-highest value in pred_arr is greater than this threshold, it will be converted to a 1. The highest value is automatically converted to a 1 (Pokemon have at least 1 type).

Returns

The valid binary predictions; a tensor of shape m x k, where m is the number of example and k is the number of classes. All entries are in the set {0, 1}, and in each example there are 1 or 2 ones.

preprocess_features(raw_feature_tensor: numpy.ndarray) numpy.ndarray

Returns the preprocessed feature tensor from the raw tensor. The preprocessed features are how training/validation/test as well as prediction data are fed into downstream models. The preprocessed tensors are of shape m x h x w x c, where m is the number of images, h is the image height, w is the image width, and c is the number of channels (3 for RGB), with all values in the interval [0, 1].

Parameters

raw_feature_tensor – The raw features to be preprocessed.

Returns

The preprocessed feature tensor. This tensor is ready for downstream model consumption.

preprocess_labels(raw_label_tensor: numpy.ndarray) numpy.ndarray

Returns the preprocessed label tensor from the raw tensor. The preprocessed labels are how training/validation/test as well as prediction data are fed into downstream models. Preprocessed labels are tensors of shape m x k, where m is the number of examples, and k is the number of classes, where each of the k-length vectors are binary, multi-label encoded for a minimum of 1 and a maximum of 2 entries per vector.

Parameters

raw_label_tensor – The raw labels to be preprocessed.

Returns

The preprocessed label tensor. This tensor is ready for downstream model consumption.

unpreprocess_features(feature_tensor: numpy.ndarray) numpy.ndarray

Returns the raw feature tensor from the preprocessed tensor; inverts preprocessing. Improves model interpretability by enabling users to transform model inputs into real-world values.

Parameters

feature_tensor – The preprocessed features to be inverted.

Returns

The raw feature tensor.

unpreprocess_labels(label_tensor: numpy.ndarray) numpy.ndarray

Returns the raw label tensor from the preprocessed tensor; inverts preprocessing. Improves model interpretability by enabling users to transform model outputs into real-world values.

Parameters

label_tensor – The preprocessed labels to be inverted.

Returns

The raw label tensor.

mlops.examples.image.classification.publish_dataset module

Publishes a new dataset to the local or remote filesystem. This script should be run any time the data processor changes.

mlops.examples.image.classification.publish_dataset.main() None

Runs the program.

mlops.examples.image.classification.publish_dataset.publish_dataset(publication_path: str) str

Builds and publishes the dataset.

Parameters

publication_path – The path on the local or remote filesystem to which to publish the dataset.

Returns

The versioned dataset’s publication path.

mlops.examples.image.classification.train_model module

Trains a new model on the Pokemon classification task.

mlops.examples.image.classification.train_model.get_baseline_model(dataset: mlops.dataset.versioned_dataset.VersionedDataset) keras.engine.training.Model

Returns a new Keras Model for use on the dataset. This model is only a baseline; developers should also experiment with custom models in notebook environments.

Parameters

dataset – The input dataset. Used to determine model input and output shapes.

Returns

A new Keras Model for use on the dataset.

mlops.examples.image.classification.train_model.main() None

Runs the program.

mlops.examples.image.classification.train_model.publish_model(model: keras.engine.training.Model, dataset: mlops.dataset.versioned_dataset.VersionedDataset, training_config: mlops.model.training_config.TrainingConfig, publication_path: str, tags: Optional[List[str]] = None) str

Publishes the model to the path on the local or remote filesystem.

Parameters
  • model – The model to be published, with the exact weights desired for publication (the user needs to set the weights to the best found during training if that is what they desire).

  • dataset – The input dataset.

  • training_config – The training configuration.

  • publication_path – The path to which the model will be published.

  • tags – Optional tags for the published model.

Returns

The versioned model’s publication path.

mlops.examples.image.classification.train_model.train_model(model: keras.engine.training.Model, dataset: mlops.dataset.versioned_dataset.VersionedDataset, model_checkpoint_filename: Optional[str] = None, **fit_kwargs: Any) mlops.model.training_config.TrainingConfig

Trains the model on the dataset and returns the training configuration object.

Parameters
  • model – The Keras Model to be trained.

  • dataset – The input dataset.

  • model_checkpoint_filename – If supplied, saves model checkpoints to the specified path.

  • fit_kwargs – Keyword arguments to be passed to model.fit().

Returns

The training configuration.

Module contents

Contains an example on an image classification task.