What to Know before Adopting Arm Lessons Learned at Datadog Prt265

Title

AWS re:Invent 2022 - What to know before adopting Arm: Lessons learned at Datadog (PRT265)

Summary

  • Datadog's journey to adopting ARM-based Graviton instances began with a hackathon project in 2017, leading to the creation of the Datadog IoT agent.
  • The company decided to adopt ARM due to the growing trend of ARM64 hosts and the potential for better performance at a lower cost.
  • Datadog's migration plan involved discussions with AWS, identifying top services by spend, setting performance baselines, and iterative work with shadow and canary deployments.
  • Three key lessons were learned during the migration:
    1. Measure performance constantly, as ARM64 is not a silver bullet for all workloads.
    2. Stay up to date with software, including the Linux kernel and dependencies, to ensure compatibility and performance improvements.
    3. Rethink code to optimize for ARM64, which can lead to significant cost savings.
  • Datadog encountered challenges with Redis performance, container image building, and tag normalization in Go, which were addressed through various strategies.
  • The talk concluded with a Q&A session and resources for further information.

Insights

  • ARM adoption is not just about the migration but also about achieving better price performance. It requires careful planning, measurement, and optimization.
  • Emulation is not always the best approach for building multi-architecture images due to performance issues. Running ARM64 instances for building images can be more efficient.
  • Keeping software up to date is crucial in an ARM environment, as updates often include ARM-specific performance and functionality improvements.
  • Rethinking code can lead to unexpected performance gains and cost savings, as demonstrated by Datadog's optimization of tag normalization.
  • Collaboration with AWS and the Graviton team can provide valuable support and insights during the migration process.
  • The experience of Datadog underscores the importance of monitoring and profiling tools in identifying and addressing performance bottlenecks during ARM adoption.