Whats New with Amazon Emr Ant302

Title

AWS re:Invent 2022 - What’s new with Amazon EMR (ANT302)

Summary

  • Speaker: Neil Mukherjee, part of the EMR PM team.
  • Amazon EMR Overview: EMR is used for running big data workloads on open-source frameworks with the latest versions of Spark, Flink, Hoodie, Iceberg, Hive, etc.
  • Cost Optimization: EMR offers cost-saving options like reserved instances, savings plans, spot instances, and performance-optimized runtimes.
  • Storage: EMR is built for S3 storage with a decoupled compute-storage model, allowing independent scaling.
  • Deployment Options: EMR on EC2, EMR on EKS, EMR on Outposts, and the new EMR Serverless.
  • Customer Case Study: Global Customer using EMR for data mesh processing and Findra using EMR for processing trillions of marketing events daily.
  • Spot Instances: EMR supports spot instances with up to 90% discounts and instance fleets for diversified requests.
  • Graviton2: Many customers are moving to Graviton2 for cost and performance advantages.
  • Managed Scaling: EMR offers managed scaling for cost optimization.
  • Performance Optimizations: EMR provides performance-optimized runtimes for Spark, Presto, Trino, and Hive.
  • EMR and EKS: Offers a consolidated infrastructure for organizations using Kubernetes and EKS.
  • Transactional Data Lakes: EMR supports frameworks like Hudi, Iceberg, and Delta for transactional data lakes.
  • EMR Serverless: A new offering that simplifies the management of EMR clusters and servers.
  • Interactive Notebooks: EMR Studio provides a managed IDE for interactive data analytics with JupyterLab.
  • Security: EMR offers various security features including isolation, authentication, authorization, encryption, and audit capabilities.

Insights

  • Performance: EMR's performance-optimized runtimes can be up to three times faster than open-source versions, which directly translates to cost savings.
  • Spot Instances: The use of spot instances is highly recommended for cost savings, and EMR's built-in handling of spot interruptions minimizes job disruptions.
  • Graviton2: The shift towards Graviton2 processors indicates a trend towards ARM64 architecture for better price performance in cloud computing.
  • Managed Scaling: The managed scaling feature in EMR allows for dynamic resource allocation, which can lead to significant cost reductions.
  • EMR Serverless: The introduction of EMR Serverless addresses the demand for simplified cluster management and cost optimization, potentially attracting more customers to EMR.
  • Transactional Data Lakes: The support for Hudi, Iceberg, and Delta highlights the growing importance of transactional capabilities in data lakes for compliance and data management.
  • Interactive Notebooks: The emphasis on EMR Studio and interactive notebooks suggests a shift towards more user-friendly and collaborative data science environments.
  • Security: The detailed security features in EMR demonstrate AWS's commitment to providing comprehensive security measures for big data workloads.