Title
AWS re:Invent 2022 - What’s new with Amazon EMR (ANT302)
Summary
- Speaker: Neil Mukherjee, part of the EMR PM team.
- Amazon EMR Overview: EMR is used for running big data workloads on open-source frameworks with the latest versions of Spark, Flink, Hoodie, Iceberg, Hive, etc.
- Cost Optimization: EMR offers cost-saving options like reserved instances, savings plans, spot instances, and performance-optimized runtimes.
- Storage: EMR is built for S3 storage with a decoupled compute-storage model, allowing independent scaling.
- Deployment Options: EMR on EC2, EMR on EKS, EMR on Outposts, and the new EMR Serverless.
- Customer Case Study: Global Customer using EMR for data mesh processing and Findra using EMR for processing trillions of marketing events daily.
- Spot Instances: EMR supports spot instances with up to 90% discounts and instance fleets for diversified requests.
- Graviton2: Many customers are moving to Graviton2 for cost and performance advantages.
- Managed Scaling: EMR offers managed scaling for cost optimization.
- Performance Optimizations: EMR provides performance-optimized runtimes for Spark, Presto, Trino, and Hive.
- EMR and EKS: Offers a consolidated infrastructure for organizations using Kubernetes and EKS.
- Transactional Data Lakes: EMR supports frameworks like Hudi, Iceberg, and Delta for transactional data lakes.
- EMR Serverless: A new offering that simplifies the management of EMR clusters and servers.
- Interactive Notebooks: EMR Studio provides a managed IDE for interactive data analytics with JupyterLab.
- Security: EMR offers various security features including isolation, authentication, authorization, encryption, and audit capabilities.
Insights
- Performance: EMR's performance-optimized runtimes can be up to three times faster than open-source versions, which directly translates to cost savings.
- Spot Instances: The use of spot instances is highly recommended for cost savings, and EMR's built-in handling of spot interruptions minimizes job disruptions.
- Graviton2: The shift towards Graviton2 processors indicates a trend towards ARM64 architecture for better price performance in cloud computing.
- Managed Scaling: The managed scaling feature in EMR allows for dynamic resource allocation, which can lead to significant cost reductions.
- EMR Serverless: The introduction of EMR Serverless addresses the demand for simplified cluster management and cost optimization, potentially attracting more customers to EMR.
- Transactional Data Lakes: The support for Hudi, Iceberg, and Delta highlights the growing importance of transactional capabilities in data lakes for compliance and data management.
- Interactive Notebooks: The emphasis on EMR Studio and interactive notebooks suggests a shift towards more user-friendly and collaborative data science environments.
- Security: The detailed security features in EMR demonstrate AWS's commitment to providing comprehensive security measures for big data workloads.