Using Observability to Build Trust Improve Incident Response Times Prt317

Title

AWS re:Invent 2022 - Using observability to build trust & improve incident response times (PRT317)

Summary

  • Jordan Spiers from New Relic opened the session, emphasizing the importance of observability in modern cloud environments and its role in operational excellence and business outcomes.
  • Mikhailail, General Manager from CloudAware, and Sam Brindley from New York Life Insurance shared their journey of adopting AWS with New Relic, focusing on the concept of the observable enterprise.
  • Key objectives discussed were building trust, reducing noise, and increasing coverage through integration, automation, and governance.
  • The session covered the importance of early decision-making in cloud transformations, the need for accurate assessments, and post-migration visibility.
  • New Relic's partnership with AWS and strategic partners like CloudAware was highlighted as crucial for customer success.
  • The talk included a technical demonstration of using Terraform for alerting as code, enabling developers to autonomously manage and monitor their AWS resources with New Relic.

Insights

  • Observability is not just about monitoring but about providing actionable insights and confidence to make changes without causing downtime.
  • Modern cloud transformations require a departure from one-size-fits-all approaches, emphasizing the need for tailored solutions and early planning.
  • The concept of the observable enterprise is central to managing large-scale cloud environments, with a focus on trust, noise reduction, and coverage.
  • Integration, automation, and governance are key pillars in achieving an observable enterprise, with tools like CloudAware playing a pivotal role.
  • The shift-left approach empowers developers to take ownership of monitoring and alerting, reducing bureaucracy and enabling scalability.
  • New Relic's Terraform integration allows for alerting as code, providing developers with the tools to manage monitoring and alerting in a self-service manner.
  • The cost implications of monitoring strategies were discussed, with a comparison between pull (CloudWatch API) and push (Kinesis stream) methods.
  • The session underscored the importance of communication with business stakeholders to build trust and ensure they understand the state of IT operations.