AWS AutoML: A Practical Guide - Part 5: Model Deployment and Inference

This blog shows you how to deploy and use your machine learning model in production using AWS AutoML and AWS SageMaker. You will learn how to create, invoke, monitor, and update a model endpoint.

Table of Contents

1. Introduction

In this blog, you will learn how to deploy and use your machine learning model in production using AWS AutoML and AWS SageMaker. You will also learn how to monitor and update your model endpoint as needed.

Model deployment is the process of making your trained machine learning model available for inference, which is the process of generating predictions from new data. AWS AutoML allows you to easily create and train machine learning models using a no-code interface. AWS SageMaker is a fully managed service that provides you with the tools and infrastructure to deploy and use your models in production.

By the end of this blog, you will be able to:

Create a model endpoint using AWS AutoML and AWS SageMaker.
Invoke the model endpoint from different sources, such as AWS Console, AWS CLI, and Python SDK.
Monitor the model endpoint using metrics and logs.
Set alarms and notifications for the model endpoint.
Update the model endpoint with a new model version or a new endpoint configuration.

Before you start, you will need the following:

An AWS account with access to AWS AutoML and AWS SageMaker.
A trained machine learning model using AWS AutoML.
A basic understanding of AWS services and concepts.

Are you ready to deploy and use your machine learning model in production? Let’s get started!

2. Creating a Model Endpoint

After you have trained your machine learning model using AWS AutoML, you need to create a model endpoint to make it available for inference. A model endpoint is a web service that allows you to send requests to your model and receive predictions in response. You can create a model endpoint using AWS SageMaker, which provides you with the infrastructure and tools to deploy and manage your models in production.

To create a model endpoint, you need to follow these steps:

Export your trained model from AWS AutoML to AWS SageMaker.
Create a model artifact and a model container using AWS SageMaker.
Create an endpoint configuration using AWS SageMaker.
Create an endpoint using AWS SageMaker.
Test your endpoint using AWS SageMaker.

In this section, you will learn how to perform each of these steps using the AWS Console, which is a graphical user interface that allows you to interact with AWS services. You can also use the AWS CLI or the Python SDK to perform the same tasks programmatically, which you will learn in the next sections.

Let’s start by exporting your trained model from AWS AutoML to AWS SageMaker.

3. Invoking the Model Endpoint

Once you have created a model endpoint, you can invoke it to get predictions from your machine learning model. You can invoke the model endpoint from different sources, such as the AWS Console, the AWS CLI, or the Python SDK. In this section, you will learn how to invoke the model endpoint from each of these sources and see the results.

To invoke the model endpoint, you need to send a request to the endpoint with the input data that you want to get predictions for. The input data must be in the same format as the data that you used to train your model. The model endpoint will process the request and return a response with the predictions and other information, such as the inference time and the model version.

The format of the request and the response may vary depending on the type of the model and the source of the invocation. For example, if your model is a regression model, the response will contain a single numeric value for each input. If your model is a classification model, the response will contain a list of probabilities for each class. You can also specify the content type and the accept type of the request and the response, such as JSON, CSV, or plain text.

In this blog, we will assume that your model is a binary classification model that predicts whether a customer will churn or not based on some features, such as age, gender, and tenure. The input data is a CSV file with one row per customer and one column per feature. The output data is a JSON file with one object per customer and two fields: predicted_label and predicted_probability.

Let’s see how to invoke the model endpoint from different sources and get predictions for your input data.

3.1. Invoking from AWS Console

The easiest way to invoke your model endpoint is from the AWS Console, which is a graphical user interface that allows you to interact with AWS services. You can use the AWS Console to send requests to your model endpoint and see the responses in real time. You can also download the responses as a JSON file for further analysis.

To invoke your model endpoint from the AWS Console, you need to follow these steps:

Log in to your AWS account and navigate to the AWS SageMaker service.
On the left sidebar, click on Endpoints and select the endpoint that you created in the previous section.
On the endpoint details page, click on the Invoke endpoint button on the top right corner.
On the invoke endpoint page, you can choose the content type and the accept type of the request and the response. For this blog, we will use text/csv as the content type and application/json as the accept type.
You can also choose the input data source and the output data destination. For this blog, we will use Upload file from local as the input data source and Download file to local as the output data destination.
Click on the Browse button and select the CSV file that contains the input data that you want to get predictions for. The file should have one row per customer and one column per feature, as described in the previous section.
Click on the Invoke button to send the request to the model endpoint. You will see a message that says Invoking endpoint… while the request is being processed.
After the request is completed, you will see a message that says Endpoint invoked successfully and a link to download the response as a JSON file. You can also see the response in the Output section below the link.
The response will contain one object per customer and two fields: predicted_label and predicted_probability. The predicted_label will be either 0 or 1, indicating whether the customer will churn or not. The predicted_probability will be a number between 0 and 1, indicating the confidence of the prediction.

Congratulations! You have successfully invoked your model endpoint from the AWS Console and got predictions for your input data. You can now analyze the results and see how well your model performs on new data. You can also invoke your model endpoint from other sources, such as the AWS CLI or the Python SDK, which you will learn in the next sections.

3.2. Invoking from AWS CLI

Another way to invoke your model endpoint is from the AWS CLI, which is a command-line interface that allows you to interact with AWS services. You can use the AWS CLI to send requests to your model endpoint and see the responses in your terminal. You can also save the responses as a JSON file for further analysis.

To invoke your model endpoint from the AWS CLI, you need to follow these steps:

Install and configure the AWS CLI on your local machine. You can follow the instructions here to install the AWS CLI and here to configure it with your AWS credentials and region.
Prepare the input data that you want to get predictions for. The input data should be a CSV file with one row per customer and one column per feature, as described in the previous section. You can use any text editor or spreadsheet software to create the CSV file.
Open a terminal and use the aws sagemaker-runtime invoke-endpoint command to send the request to the model endpoint. You need to specify the following parameters:

–endpoint-name: The name of the endpoint that you created in the previous section.
–body: The path to the CSV file that contains the input data.
–content-type: The content type of the request. For this blog, we will use text/csv.
–accept: The accept type of the response. For this blog, we will use application/json.
output.json: The name of the JSON file that will store the response.

The command will look something like this:

aws sagemaker-runtime invoke-endpoint --endpoint-name my-endpoint --body file://input.csv --content-type text/csv --accept application/json output.json

After the command is executed, you will see a message that says Invoked with 200 response and some information about the request and the response. You can also open the output.json file and see the response in JSON format.
The response will contain one object per customer and two fields: predicted_label and predicted_probability. The predicted_label will be either 0 or 1, indicating whether the customer will churn or not. The predicted_probability will be a number between 0 and 1, indicating the confidence of the prediction.

Well done! You have successfully invoked your model endpoint from the AWS CLI and got predictions for your input data. You can now analyze the results and see how well your model performs on new data. You can also invoke your model endpoint from other sources, such as the Python SDK, which you will learn in the next section.

3.3. Invoking from Python SDK

In this section, you will learn how to invoke your model endpoint from Python SDK, which is a programming interface that allows you to interact with AWS services using Python code. You can use the Python SDK to invoke your model endpoint from any application or environment that supports Python, such as Jupyter Notebook, PyCharm, or AWS Lambda.

To invoke your model endpoint from Python SDK, you need to follow these steps:

Install and configure the AWS SDK for Python (Boto3).
Import the Boto3 module and create a SageMaker client object.
Prepare the input data for your model endpoint.
Call the invoke_endpoint method of the SageMaker client object and pass the input data and the endpoint name as parameters.
Parse the response from the model endpoint and extract the predictions.

In this section, you will learn how to perform each of these steps using a simple example. You will use a model endpoint that predicts the sentiment of movie reviews, which you have created and deployed using AWS AutoML and AWS SageMaker in the previous sections.

Let’s start by installing and configuring the AWS SDK for Python (Boto3).

4. Monitoring the Model Endpoint

Once you have created and invoked your model endpoint, you need to monitor its performance and health. Monitoring your model endpoint allows you to track its availability, latency, throughput, errors, and other metrics that can help you optimize its performance and reliability. You can also set alarms and notifications to alert you when your model endpoint experiences any issues or anomalies.

To monitor your model endpoint, you can use the following tools and services:

AWS CloudWatch, which is a service that collects and displays metrics and logs from your AWS resources, such as your model endpoint.
AWS SageMaker Studio, which is an integrated development environment (IDE) that provides you with a graphical user interface to view and analyze the metrics and logs from your model endpoint.
AWS SageMaker Debugger, which is a feature that enables you to capture and analyze the internal state of your model during inference, such as the tensors, gradients, and weights.

In this section, you will learn how to use these tools and services to monitor your model endpoint using the AWS Console, which is a graphical user interface that allows you to interact with AWS services. You can also use the AWS CLI or the Python SDK to perform the same tasks programmatically, which you will learn in the next sections.

Let’s start by viewing the metrics and logs from your model endpoint using AWS CloudWatch.

4.1. Viewing Metrics and Logs

AWS CloudWatch is a service that collects and displays metrics and logs from your AWS resources, such as your model endpoint. Metrics are numerical values that measure the performance and health of your resources, such as the number of invocations, the average latency, the error rate, and the CPU utilization. Logs are text records that capture the events and activities of your resources, such as the input and output data, the errors and warnings, and the debugging information.

To view the metrics and logs from your model endpoint using AWS CloudWatch, you need to follow these steps:

Open the AWS Console and navigate to the CloudWatch service.
On the left navigation pane, click on Metrics and then select SageMaker from the list of namespaces.
On the right panel, you will see a list of metrics for your model endpoint, such as Invocations, ModelLatency, Errors, and CPUUtilization. You can select one or more metrics to view their graphs and statistics over time.
On the left navigation pane, click on Logs and then select Log groups from the drop-down menu.
On the right panel, you will see a list of log groups for your model endpoint, such as /aws/sagemaker/Endpoints and /aws/sagemaker/Models. You can select a log group to view its log streams, which are sequences of log events from a specific source.
You can select a log stream to view its log events, which are text records that capture the events and activities of your model endpoint. You can also filter, search, and export the log events for further analysis.

By viewing the metrics and logs from your model endpoint using AWS CloudWatch, you can gain insights into the performance and health of your model endpoint, and identify any issues or anomalies that may affect its functionality.

4.2. Setting Alarms and Notifications

Once you have created and tested your model endpoint, you may want to set up alarms and notifications to monitor its performance and health. Alarms and notifications can help you detect and respond to issues such as high latency, low throughput, or errors in your model endpoint. You can use AWS CloudWatch to create and manage alarms and notifications for your model endpoint.

AWS CloudWatch is a service that collects and analyzes metrics and logs from your AWS resources. You can use AWS CloudWatch to create alarms based on predefined or custom thresholds for your model endpoint metrics, such as Invocations, Errors, Latency, or ModelDataSize. You can also use AWS CloudWatch to send notifications to your email, SMS, or other channels when your alarms are triggered.

To set up alarms and notifications for your model endpoint, you need to follow these steps:

Create a topic and a subscription using AWS SNS.
Create an alarm and an action using AWS CloudWatch.
Verify and test your alarm and notification using AWS CloudWatch.

In this section, you will learn how to perform each of these steps using the AWS Console. You can also use the AWS CLI or the Python SDK to perform the same tasks programmatically, which you can learn from the official documentation.

Let’s start by creating a topic and a subscription using AWS SNS.

5. Updating the Model Endpoint

As you use your model endpoint for inference, you may want to update it with a new model version or a new endpoint configuration. A new model version may have improved accuracy or performance, while a new endpoint configuration may have different resource allocation or scaling options. You can update your model endpoint using AWS SageMaker, which allows you to modify your endpoint without interrupting your inference requests.

To update your model endpoint, you need to follow these steps:

Create a new model artifact and a new model container using AWS SageMaker.
Create a new endpoint configuration using AWS SageMaker.
Update your endpoint using AWS SageMaker.
Test your updated endpoint using AWS SageMaker.

Let’s start by creating a new model artifact and a new model container using AWS SageMaker.

5.1. Updating the Model Version

If you have trained a new version of your machine learning model using AWS AutoML, you may want to update your model endpoint with the new version. Updating the model version can improve the accuracy or performance of your model endpoint, as well as enable new features or functionalities. You can update your model version using AWS SageMaker, which allows you to replace the existing model artifact and model container with the new ones.

To update your model version, you need to follow these steps:

Export your new trained model from AWS AutoML to AWS SageMaker.
Create a new model artifact and a new model container using AWS SageMaker.
Update your endpoint configuration with the new model using AWS SageMaker.
Test your updated endpoint using AWS SageMaker.

Let’s start by exporting your new trained model from AWS AutoML to AWS SageMaker.

5.2. Updating the Endpoint Configuration

Sometimes, you may want to update the endpoint configuration of your model endpoint, which defines the resources and settings for your endpoint. For example, you may want to change the instance type or the number of instances for your endpoint, or you may want to enable data capture or encryption for your endpoint. Updating the endpoint configuration allows you to modify these aspects of your endpoint without changing the model itself.

To update the endpoint configuration, you need to follow these steps:

Create a new endpoint configuration using AWS SageMaker.
Update the endpoint with the new endpoint configuration using AWS SageMaker.
Wait for the endpoint to switch to the new endpoint configuration.

Let’s start by creating a new endpoint configuration using AWS SageMaker.

6. Conclusion

In this blog, you have learned how to deploy and use your machine learning model in production using AWS AutoML and AWS SageMaker. You have also learned how to monitor and update your model endpoint as needed.

You have covered the following topics:

How to export your trained model from AWS AutoML to AWS SageMaker.
How to create a model endpoint using AWS SageMaker.
How to invoke the model endpoint from different sources, such as AWS Console, AWS CLI, and Python SDK.
How to monitor the model endpoint using metrics and logs.
How to set alarms and notifications for the model endpoint.
How to update the model endpoint with a new model version or a new endpoint configuration.

By following this blog, you have gained a practical understanding of how to use AWS AutoML and AWS SageMaker to create and manage your machine learning models in production. You have also learned some best practices and tips for optimizing your model deployment and inference.

We hope you have enjoyed this blog and found it useful. If you have any questions or feedback, please feel free to leave a comment below. Thank you for reading!

AWS AutoML: A Practical Guide – Part 5: Model Deployment and Inference

1. Introduction

2. Creating a Model Endpoint