Title
AWS re:Invent 2022 - Deploy ML models for inference at high performance & low cost, ft AT&T (AIM302)
Summary
- Venkatesh Krishnan and Rama Thaman from Amazon SageMaker, along with Matt and Antoine from AT&T, discuss deploying machine learning models for inference with high performance and low cost using Amazon SageMaker.
- Machine learning is becoming integral to business applications, with a prediction that one in three applications will incorporate ML by 2026.
- SageMaker allows for the deployment of models behind an endpoint, supporting real-time inference, batch transforms, and asynchronous inference.
- SageMaker can handle complex models, including those with billions of parameters, by parallelizing large models across GPUs or using AWS Inferentia instances for cost savings.
- Performance optimization in SageMaker includes low overhead latency, smart routing for high throughput, and multi-model endpoints for scalability.
- Cost optimization is achieved through selecting the right instances, auto-scaling, serverless inference, and multi-model endpoints, which can save up to 90% of deployment costs.
- MLOps tools on SageMaker automate model building workflows, provide CI/CD templates, deployment guardrails, model monitoring, and model registry for operational excellence.
- Rama emphasizes the challenges of managing ML infrastructure independently and the benefits of using SageMaker, including compliance, cost optimization, and MLOps capabilities.
- AT&T's experience with SageMaker is shared, highlighting the use of multi-model endpoints, EventBridge for orchestration, and AWS support in reducing the time to production.
- The architecture involves Glue for data extraction, orchestrator microservices for managing training jobs, and multi-model endpoints for model deployment.
- AT&T achieved significant cost savings and performance goals with SageMaker, reducing the projected timeline from two years to four months.
- The data science aspect involved using Isolation Forest for anomaly detection, SageMaker Notebooks for exploration, SageMaker Processing for feature engineering, and SageMaker Clarify for model monitoring and validation.
- AT&T managed to reduce the number of models needed by clustering similar models, which simplified operations and addressed the ML cold start problem.
Insights
- The integration of machine learning into business applications is rapidly increasing, necessitating efficient deployment methods for ML models.
- SageMaker provides a comprehensive solution for deploying ML models, addressing the trilemma of complexity, performance, and cost.
- The ability to deploy large and complex models without significant cost implications is a key advantage of using SageMaker, especially with the support of AWS Inferentia instances.
- SageMaker's multi-model endpoints and serverless inference options offer scalability and cost-effectiveness for businesses with varying traffic patterns and unpredictable workloads.
- MLOps tools within SageMaker streamline the process of deploying and managing ML models, reducing operational overhead and facilitating continuous integration and delivery.
- AT&T's case study demonstrates the practical benefits of using SageMaker, including reduced time to production, cost savings, and the ability to handle high volumes of inference requests.
- The use of AWS Glue and EventBridge in AT&T's architecture underscores the importance of seamless data integration and orchestration in ML workflows.
- The session highlights the importance of leveraging AWS support and resources to optimize the deployment and management of ML models on SageMaker.
- Clustering similar models to reduce the number of models in production can lead to operational efficiencies and faster onboarding of ML-based detections for new customers.
- The session underscores the evolving nature of ML deployments and the need for flexible infrastructure that can adapt to changing project requirements and objectives.