mlops.examples.image.classification package
Submodules
mlops.examples.image.classification.errors module
Contains custom errors for the Pokemon classification example.
- exception mlops.examples.image.classification.errors.LabelsNotFoundError
Bases:
FileNotFoundError
Raised when a PokemonClassificationDataProcessor attempts to load labels for prediction data, an unlabeled data source.
- exception mlops.examples.image.classification.errors.NoModelPathsSuppliedError
Bases:
ValueError
Raised when a non-empty collection of strings representing paths to models is expected, but an empty collection is passed instead.
mlops.examples.image.classification.model_prediction module
Loads a VersionedModel and uses it to run prediction on unseen data.
- mlops.examples.image.classification.model_prediction.get_best_model(model_paths: Collection) mlops.model.versioned_model.VersionedModel
Returns the versioned model with the best performance on the validation dataset.
- Parameters
model_paths – The paths to the versioned models to load.
- Returns
The versioned model with the best performance on the validation dataset.
- mlops.examples.image.classification.model_prediction.main() None
Runs the program.
- mlops.examples.image.classification.model_prediction.model_evaluate(dataset: mlops.dataset.versioned_dataset.VersionedDataset, model: mlops.model.versioned_model.VersionedModel) float
Returns the model’s loss on the test dataset.
- Parameters
dataset – The dataset.
model – The model.
- Returns
The model’s loss on the test dataset.
- mlops.examples.image.classification.model_prediction.model_predict(features: numpy.ndarray, dataset: mlops.dataset.versioned_dataset.VersionedDataset, model: mlops.model.versioned_model.VersionedModel) numpy.ndarray
Returns the model’s unpreprocessed predictions on the data located at the given path.
- Parameters
features – The preprocessed features on which to run prediction.
dataset – The dataset.
model – The model.
- Returns
The model’s unpreprocessed predictions on the data located at the given path.
mlops.examples.image.classification.pokemon_classification_data_processor module
Contains the PokemonClassificationDataProcessor class.
- class mlops.examples.image.classification.pokemon_classification_data_processor.PokemonClassificationDataProcessor
Bases:
mlops.dataset.invertible_data_processor.InvertibleDataProcessor
Transforms the pokemon dataset at sample_data/pokemon into features and labels for classification.
- get_raw_features(dataset_path: str) Dict[str, numpy.ndarray]
Returns the raw feature tensors from the prediction dataset path. Raw features are tensors of shape m x h x w x c, where m is the number of images, h is the image height, w is the image width, and c is the number of channels (3 for RGB), with all values in the interval [0, 1]. The features are already scaled because PNG images load into float32 instead of uint8.
- Parameters
dataset_path – The path to the file or directory on the local or remote filesystem containing the dataset.
- Returns
A dictionary whose values are feature tensors and whose corresponding keys are the names by which those tensors should be referenced. The returned keys will be {‘X_train’, ‘X_val’, ‘X_test’} if the directory indicated by dataset_path ends with ‘trainvaltest’, and {‘X_pred’} otherwise.
- get_raw_features_and_labels(dataset_path: str) Tuple[Dict[str, numpy.ndarray], Dict[str, numpy.ndarray]]
Returns the raw feature and label tensors from the dataset path. This method is specifically used for the train/val/test sets and not input data for prediction, because in some cases the features and labels need to be read simultaneously to ensure proper ordering of features and labels.
Raw features are tensors of shape m x h x w x c, where m is the number of images, h is the image height, w is the image width, and c is the number of channels (3 for RGB), with all values in the interval [0, 1]. Raw labels are tensors of shape m x 2, where m is the number of examples. All entries are strings from CLASSES indicating 1 or 2 (if multi-typed) types belonging to the sample. Types are not ordered.
- Parameters
dataset_path – The path to the file or directory on the local or remote filesystem containing the dataset, specifically train/val/test and not prediction data.
- Returns
A 2-tuple of the features dictionary and labels dictionary, with matching keys and ordered tensors.
- static get_valid_prediction(pred_arr: numpy.ndarray, threshold: float = 0.5) numpy.ndarray
Returns a valid binary prediction from the raw prediction tensor. A valid prediction has one or two 1s, and all other entries are 0. The highest value in the prediction array is automatically converted to a 1, and the second-highest is converted to a 1 if the value is higher than the given decision threshold.
- Parameters
pred_arr – The raw model predictions; a tensor of shape m x k, where m is the number of examples and k is the number of classes. All entries are in the range [0, 1].
threshold – The decision threshold, in the range [0, 1]. If the second-highest value in pred_arr is greater than this threshold, it will be converted to a 1. The highest value is automatically converted to a 1 (Pokemon have at least 1 type).
- Returns
The valid binary predictions; a tensor of shape m x k, where m is the number of example and k is the number of classes. All entries are in the set {0, 1}, and in each example there are 1 or 2 ones.
- preprocess_features(raw_feature_tensor: numpy.ndarray) numpy.ndarray
Returns the preprocessed feature tensor from the raw tensor. The preprocessed features are how training/validation/test as well as prediction data are fed into downstream models. The preprocessed tensors are of shape m x h x w x c, where m is the number of images, h is the image height, w is the image width, and c is the number of channels (3 for RGB), with all values in the interval [0, 1].
- Parameters
raw_feature_tensor – The raw features to be preprocessed.
- Returns
The preprocessed feature tensor. This tensor is ready for downstream model consumption.
- preprocess_labels(raw_label_tensor: numpy.ndarray) numpy.ndarray
Returns the preprocessed label tensor from the raw tensor. The preprocessed labels are how training/validation/test as well as prediction data are fed into downstream models. Preprocessed labels are tensors of shape m x k, where m is the number of examples, and k is the number of classes, where each of the k-length vectors are binary, multi-label encoded for a minimum of 1 and a maximum of 2 entries per vector.
- Parameters
raw_label_tensor – The raw labels to be preprocessed.
- Returns
The preprocessed label tensor. This tensor is ready for downstream model consumption.
- unpreprocess_features(feature_tensor: numpy.ndarray) numpy.ndarray
Returns the raw feature tensor from the preprocessed tensor; inverts preprocessing. Improves model interpretability by enabling users to transform model inputs into real-world values.
- Parameters
feature_tensor – The preprocessed features to be inverted.
- Returns
The raw feature tensor.
- unpreprocess_labels(label_tensor: numpy.ndarray) numpy.ndarray
Returns the raw label tensor from the preprocessed tensor; inverts preprocessing. Improves model interpretability by enabling users to transform model outputs into real-world values.
- Parameters
label_tensor – The preprocessed labels to be inverted.
- Returns
The raw label tensor.
mlops.examples.image.classification.publish_dataset module
Publishes a new dataset to the local or remote filesystem. This script should be run any time the data processor changes.
- mlops.examples.image.classification.publish_dataset.main() None
Runs the program.
- mlops.examples.image.classification.publish_dataset.publish_dataset(publication_path: str) str
Builds and publishes the dataset.
- Parameters
publication_path – The path on the local or remote filesystem to which to publish the dataset.
- Returns
The versioned dataset’s publication path.
mlops.examples.image.classification.train_model module
Trains a new model on the Pokemon classification task.
- mlops.examples.image.classification.train_model.get_baseline_model(dataset: mlops.dataset.versioned_dataset.VersionedDataset) keras.engine.training.Model
Returns a new Keras Model for use on the dataset. This model is only a baseline; developers should also experiment with custom models in notebook environments.
- Parameters
dataset – The input dataset. Used to determine model input and output shapes.
- Returns
A new Keras Model for use on the dataset.
- mlops.examples.image.classification.train_model.main() None
Runs the program.
- mlops.examples.image.classification.train_model.publish_model(model: keras.engine.training.Model, dataset: mlops.dataset.versioned_dataset.VersionedDataset, training_config: mlops.model.training_config.TrainingConfig, publication_path: str, tags: Optional[List[str]] = None) str
Publishes the model to the path on the local or remote filesystem.
- Parameters
model – The model to be published, with the exact weights desired for publication (the user needs to set the weights to the best found during training if that is what they desire).
dataset – The input dataset.
training_config – The training configuration.
publication_path – The path to which the model will be published.
tags – Optional tags for the published model.
- Returns
The versioned model’s publication path.
- mlops.examples.image.classification.train_model.train_model(model: keras.engine.training.Model, dataset: mlops.dataset.versioned_dataset.VersionedDataset, model_checkpoint_filename: Optional[str] = None, **fit_kwargs: Any) mlops.model.training_config.TrainingConfig
Trains the model on the dataset and returns the training configuration object.
- Parameters
model – The Keras Model to be trained.
dataset – The input dataset.
model_checkpoint_filename – If supplied, saves model checkpoints to the specified path.
fit_kwargs – Keyword arguments to be passed to model.fit().
- Returns
The training configuration.
Module contents
Contains an example on an image classification task.