How Four Customers Reduced Ml Inference Costs and Drove Innovation Cmp226

Title

AWS re:Invent 2022 - How four customers reduced ML inference costs and drove innovation (CMP226)

Summary

  • Garmang Khan, product manager for Inferentia and Tranium, discusses the pervasiveness and growth of AI, leading to larger model sizes and the need for faster, cheaper deployment solutions.
  • AWS Inferentia and Trainium are introduced as solutions for accelerating ML inference and training while reducing costs and energy use.
  • Inferentia 2 (Inf2) is announced, offering higher performance and lower latencies than its predecessor.
  • AWS aims to make ML more sustainable, with Inferentia reducing energy use by over 90% and Trainium saving up to 50% in energy costs.
  • Customers present their experiences with Inferentia, highlighting performance gains, cost savings, and reduced carbon emissions.
  • Screening Eagle Technologies discusses their use of AWS for predictive asset healthcare, leveraging synthetic datasets for AI training, and transitioning to Inf1 instances for cost and latency improvements.
  • Actuate shares their journey of using AWS SageMaker, Neuron, and Inferentia to improve their threat detection AI, resulting in significant cost savings and reduced deployment times.
  • Money Forward, a Japanese fintech company, details their migration to AWS Inferentia instances for their chatbot service, achieving faster inference speeds and cost reductions.
  • Dataminr, a real-time information platform, utilizes Inferentia to process billions of data inputs per day, achieving substantial increases in throughput and cost savings.

Insights

  • The trend of AI becoming more pervasive and models growing in size is driving the need for specialized hardware like AWS Inferentia and Trainium to handle ML inference and training efficiently.
  • AWS is focusing on not only improving performance and reducing costs but also on sustainability by significantly lowering the energy consumption of ML applications.
  • The integration of AWS Neuron software stack with popular ML frameworks like PyTorch and TensorFlow simplifies the deployment process for customers.
  • Real-world customer stories demonstrate that transitioning to AWS Inferentia can lead to dramatic cost savings, improved latency, and reduced carbon footprint without compromising on performance.
  • The use of AWS SageMaker and Neuron has enabled companies like Actuate to streamline their ML pipelines, reducing the time from data labeling to deployment from weeks to minutes.
  • AWS Inferentia's ability to optimize for either latency or throughput allows for tailored performance improvements based on individual model requirements.
  • The success stories shared by customers at AWS re:Invent 2022 underscore the tangible benefits of AWS's ML solutions in diverse applications, from predictive asset healthcare to threat detection and real-time information processing.