Migrating to Amazon Emr to Reduce Costs and Simplify Operations Ant337

Title

AWS re:Invent 2022 - Migrating to Amazon EMR to reduce costs and simplify operations (ANT337)

Summary

  • LendingClub's Migration Experience:

    • LendingClub, a fintech company, migrated from an on-premise Hadoop environment to Amazon EMR.
    • They faced challenges with their on-premise setup, including limited burst capacity, difficulty in upgrades, and a lack of agility.
    • The migration to EMR allowed them to handle burst workloads, automate node recovery, and improve cost efficiency.
    • They adopted a lift-and-shift strategy, ensuring identical outputs in both on-premise and AWS environments.
    • They encountered issues with S3 performance and had to refactor jobs to work with S3 instead of HDFS.
    • They developed tools for data validation, replication monitoring, and job performance comparison.
    • They achieved better scaling with EMR managed scaling and custom metrics.
    • They plan to further optimize costs, explore containers and serverless options, and adopt new data formats like Iceberg and Hudi.
  • Thomson Reuters' Migration Experience:

    • Thomson Reuters migrated their on-premise Hadoop systems to AWS to address challenges such as multi-tenant clusters, inefficient DevOps, and lack of active-active redundancy.
    • They developed a bidirectional copying system to migrate workflows incrementally without impacting other workflows.
    • They established shared frameworks and utilities to ensure consistency across migrations.
    • They created a catalog for data and exposed it to the company, leading to new business features.
    • They moved from multi-tenant clusters to ephemeral EMR clusters, improving stability and allowing per-workflow tech stack updates.
    • They reduced compute resources by nearly half and improved time to market for new features.
    • They leveraged Apache Hoodie for near-real-time data replication and incremental updates.
    • They emphasized the importance of a strong foundation, leveraging AWS expertise, and empowering developers.

Insights

  • LendingClub's Insights:

    • Migrating to the cloud requires careful planning, especially when dealing with data fabric changes like moving from HDFS to S3.
    • Custom tooling and monitoring are crucial for ensuring data accuracy and SLA compliance during migration.
    • EMR managed scaling and custom metrics can significantly improve resource utilization and cost efficiency.
    • Open source solutions and collaboration with AWS support can help overcome migration challenges.
  • Thomson Reuters' Insights:

    • Incremental migration with a bidirectional copying system can untangle complex dependencies and facilitate a smoother transition.
    • Establishing shared frameworks, utilities, and a data catalog can standardize the migration process and improve data accessibility.
    • Ephemeral EMR clusters can address the challenges of multi-tenant environments and improve workflow stability.
    • Leveraging newer technologies like Apache Hoodie can drastically reduce data replication times and enable more dynamic data processing.
    • Empowering developers to directly engage with AWS support can expedite problem resolution and reduce bottlenecks in the migration process.