Title

AWS re:Invent 2022 - Deploy ML models for inference at high performance & low cost, ft AT&T (AIM302)

Summary

Venkatesh Krishnan and Rama Thaman from Amazon SageMaker, along with Matt and Antoine from AT&T, discuss deploying machine learning models for inference with high performance and low cost using Amazon SageMaker.
Machine learning is becoming integral to business applications, with a prediction that one in three applications will incorporate ML by 2026.
SageMaker allows for the deployment of models behind an endpoint, supporting real-time inference, batch transforms, and asynchronous inference.
SageMaker can handle complex models, including those with billions of parameters, by parallelizing large models across GPUs or using AWS Inferentia instances for cost savings.
Performance optimization in SageMaker includes low overhead latency, smart routing for high throughput, and multi-model endpoints for scalability.
Cost optimization is achieved through selecting the right instances, auto-scaling, serverless inference, and multi-model endpoints, which can save up to 90% of deployment costs.
MLOps tools on SageMaker automate model building workflows, provide CI/CD templates, deployment guardrails, model monitoring, and model registry for operational excellence.
Rama emphasizes the challenges of managing ML infrastructure independently and the benefits of using SageMaker, including compliance, cost optimization, and MLOps capabilities.
AT&T's experience with SageMaker is shared, highlighting the use of multi-model endpoints, EventBridge for orchestration, and AWS support in reducing the time to production.
The architecture involves Glue for data extraction, orchestrator microservices for managing training jobs, and multi-model endpoints for model deployment.
AT&T achieved significant cost savings and performance goals with SageMaker, reducing the projected timeline from two years to four months.
The data science aspect involved using Isolation Forest for anomaly detection, SageMaker Notebooks for exploration, SageMaker Processing for feature engineering, and SageMaker Clarify for model monitoring and validation.
AT&T managed to reduce the number of models needed by clustering similar models, which simplified operations and addressed the ML cold start problem.

Insights

The integration of machine learning into business applications is rapidly increasing, necessitating efficient deployment methods for ML models.
SageMaker provides a comprehensive solution for deploying ML models, addressing the trilemma of complexity, performance, and cost.
The ability to deploy large and complex models without significant cost implications is a key advantage of using SageMaker, especially with the support of AWS Inferentia instances.
SageMaker's multi-model endpoints and serverless inference options offer scalability and cost-effectiveness for businesses with varying traffic patterns and unpredictable workloads.
MLOps tools within SageMaker streamline the process of deploying and managing ML models, reducing operational overhead and facilitating continuous integration and delivery.
AT&T's case study demonstrates the practical benefits of using SageMaker, including reduced time to production, cost savings, and the ability to handle high volumes of inference requests.
The use of AWS Glue and EventBridge in AT&T's architecture underscores the importance of seamless data integration and orchestration in ML workflows.
The session highlights the importance of leveraging AWS support and resources to optimize the deployment and management of ML models on SageMaker.
Clustering similar models to reduce the number of models in production can lead to operational efficiencies and faster onboarding of ML-based detections for new customers.
The session underscores the evolving nature of ML deployments and the need for flexible infrastructure that can adapt to changing project requirements and objectives.

Demystifying Vpc Ip Addressing Creating a Complete Routing Solution Boa320 Deploy Modern and Effective Data Models with Amazon Dynamodb Dat320