mlops.examples.image.classification package

Submodules

mlops.examples.image.classification.errors module

Contains custom errors for the Pokemon classification example.

exception mlops.examples.image.classification.errors.LabelsNotFoundError

Bases: FileNotFoundError

Raised when a PokemonClassificationDataProcessor attempts to load labels for prediction data, an unlabeled data source.

exception mlops.examples.image.classification.errors.NoModelPathsSuppliedError

Bases: ValueError

Raised when a non-empty collection of strings representing paths to models is expected, but an empty collection is passed instead.

mlops.examples.image.classification.model_prediction module

Loads a VersionedModel and uses it to run prediction on unseen data.

mlops.examples.image.classification.model_prediction.get_best_model(model_paths: Collection) → mlops.model.versioned_model.VersionedModel

Returns the versioned model with the best performance on the validation dataset.

Parameters: model_paths – The paths to the versioned models to load.
Returns: The versioned model with the best performance on the validation dataset.

mlops.examples.image.classification.model_prediction.main() → None: Runs the program.

mlops.examples.image.classification.model_prediction.model_evaluate(dataset: mlops.dataset.versioned_dataset.VersionedDataset, model: mlops.model.versioned_model.VersionedModel) → float

Returns the model’s loss on the test dataset.

Parameters

dataset – The dataset.
model – The model.

Returns

The model’s loss on the test dataset.

mlops.examples.image.classification.model_prediction.model_predict(features: numpy.ndarray, dataset: mlops.dataset.versioned_dataset.VersionedDataset, model: mlops.model.versioned_model.VersionedModel) → numpy.ndarray

Returns the model’s unpreprocessed predictions on the data located at the given path.

Parameters

features – The preprocessed features on which to run prediction.
dataset – The dataset.
model – The model.

Returns

The model’s unpreprocessed predictions on the data located at the given path.

mlops.examples.image.classification.pokemon_classification_data_processor module

Contains the PokemonClassificationDataProcessor class.

class mlops.examples.image.classification.pokemon_classification_data_processor.PokemonClassificationDataProcessor

Bases: mlops.dataset.invertible_data_processor.InvertibleDataProcessor

Transforms the pokemon dataset at sample_data/pokemon into features and labels for classification.

get_raw_features(dataset_path: str) → Dict[str, numpy.ndarray]

Returns the raw feature tensors from the prediction dataset path. Raw features are tensors of shape m x h x w x c, where m is the number of images, h is the image height, w is the image width, and c is the number of channels (3 for RGB), with all values in the interval [0, 1]. The features are already scaled because PNG images load into float32 instead of uint8.

Parameters: dataset_path – The path to the file or directory on the local or remote filesystem containing the dataset.
Returns: A dictionary whose values are feature tensors and whose corresponding keys are the names by which those tensors should be referenced. The returned keys will be {‘X_train’, ‘X_val’, ‘X_test’} if the directory indicated by dataset_path ends with ‘trainvaltest’, and {‘X_pred’} otherwise.

get_raw_features_and_labels(dataset_path: str) → Tuple[Dict[str, numpy.ndarray], Dict[str, numpy.ndarray]]

Returns the raw feature and label tensors from the dataset path. This method is specifically used for the train/val/test sets and not input data for prediction, because in some cases the features and labels need to be read simultaneously to ensure proper ordering of features and labels.

Raw features are tensors of shape m x h x w x c, where m is the number of images, h is the image height, w is the image width, and c is the number of channels (3 for RGB), with all values in the interval [0, 1]. Raw labels are tensors of shape m x 2, where m is the number of examples. All entries are strings from CLASSES indicating 1 or 2 (if multi-typed) types belonging to the sample. Types are not ordered.

Parameters: dataset_path – The path to the file or directory on the local or remote filesystem containing the dataset, specifically train/val/test and not prediction data.
Returns: A 2-tuple of the features dictionary and labels dictionary, with matching keys and ordered tensors.

static get_valid_prediction(pred_arr: numpy.ndarray, threshold: float = 0.5) → numpy.ndarray

Returns a valid binary prediction from the raw prediction tensor. A valid prediction has one or two 1s, and all other entries are 0. The highest value in the prediction array is automatically converted to a 1, and the second-highest is converted to a 1 if the value is higher than the given decision threshold.

Parameters

pred_arr – The raw model predictions; a tensor of shape m x k, where m is the number of examples and k is the number of classes. All entries are in the range [0, 1].
threshold – The decision threshold, in the range [0, 1]. If the second-highest value in pred_arr is greater than this threshold, it will be converted to a 1. The highest value is automatically converted to a 1 (Pokemon have at least 1 type).

Returns

The valid binary predictions; a tensor of shape m x k, where m is the number of example and k is the number of classes. All entries are in the set {0, 1}, and in each example there are 1 or 2 ones.

preprocess_features(raw_feature_tensor: numpy.ndarray) → numpy.ndarray

Returns the preprocessed feature tensor from the raw tensor. The preprocessed features are how training/validation/test as well as prediction data are fed into downstream models. The preprocessed tensors are of shape m x h x w x c, where m is the number of images, h is the image height, w is the image width, and c is the number of channels (3 for RGB), with all values in the interval [0, 1].

Parameters: raw_feature_tensor – The raw features to be preprocessed.
Returns: The preprocessed feature tensor. This tensor is ready for downstream model consumption.

preprocess_labels(raw_label_tensor: numpy.ndarray) → numpy.ndarray

Returns the preprocessed label tensor from the raw tensor. The preprocessed labels are how training/validation/test as well as prediction data are fed into downstream models. Preprocessed labels are tensors of shape m x k, where m is the number of examples, and k is the number of classes, where each of the k-length vectors are binary, multi-label encoded for a minimum of 1 and a maximum of 2 entries per vector.

Parameters: raw_label_tensor – The raw labels to be preprocessed.
Returns: The preprocessed label tensor. This tensor is ready for downstream model consumption.

unpreprocess_features(feature_tensor: numpy.ndarray) → numpy.ndarray

Returns the raw feature tensor from the preprocessed tensor; inverts preprocessing. Improves model interpretability by enabling users to transform model inputs into real-world values.

Parameters: feature_tensor – The preprocessed features to be inverted.
Returns: The raw feature tensor.

unpreprocess_labels(label_tensor: numpy.ndarray) → numpy.ndarray

Returns the raw label tensor from the preprocessed tensor; inverts preprocessing. Improves model interpretability by enabling users to transform model outputs into real-world values.

Parameters: label_tensor – The preprocessed labels to be inverted.
Returns: The raw label tensor.

mlops.examples.image.classification.publish_dataset module

Publishes a new dataset to the local or remote filesystem. This script should be run any time the data processor changes.

mlops.examples.image.classification.publish_dataset.main() → None: Runs the program.

mlops.examples.image.classification.publish_dataset.publish_dataset(publication_path: str) → str

Builds and publishes the dataset.

Parameters: publication_path – The path on the local or remote filesystem to which to publish the dataset.
Returns: The versioned dataset’s publication path.

mlops.examples.image.classification.train_model module

Trains a new model on the Pokemon classification task.

mlops.examples.image.classification.train_model.get_baseline_model(dataset: mlops.dataset.versioned_dataset.VersionedDataset) → keras.engine.training.Model

Returns a new Keras Model for use on the dataset. This model is only a baseline; developers should also experiment with custom models in notebook environments.

Parameters: dataset – The input dataset. Used to determine model input and output shapes.
Returns: A new Keras Model for use on the dataset.

mlops.examples.image.classification.train_model.main() → None: Runs the program.

mlops.examples.image.classification.train_model.publish_model(model: keras.engine.training.Model, dataset: mlops.dataset.versioned_dataset.VersionedDataset, training_config: mlops.model.training_config.TrainingConfig, publication_path: str, tags: Optional[List[str]] = None) → str

Publishes the model to the path on the local or remote filesystem.

Parameters

model – The model to be published, with the exact weights desired for publication (the user needs to set the weights to the best found during training if that is what they desire).
dataset – The input dataset.
training_config – The training configuration.
publication_path – The path to which the model will be published.
tags – Optional tags for the published model.

Returns

The versioned model’s publication path.

mlops.examples.image.classification.train_model.train_model(model: keras.engine.training.Model, dataset: mlops.dataset.versioned_dataset.VersionedDataset, model_checkpoint_filename: Optional[str] = None, **fit_kwargs: Any) → mlops.model.training_config.TrainingConfig

Trains the model on the dataset and returns the training configuration object.

Parameters

model – The Keras Model to be trained.
dataset – The input dataset.
model_checkpoint_filename – If supplied, saves model checkpoints to the specified path.
fit_kwargs – Keyword arguments to be passed to model.fit().

Returns

The training configuration.

Module contents

Contains an example on an image classification task.