Title
AWS re:Invent 2022 - Using observability to build trust & improve incident response times (PRT317)
Summary
- Jordan Spiers from New Relic opened the session, emphasizing the importance of observability in modern cloud environments and its role in operational excellence and business outcomes.
- Mikhailail, General Manager from CloudAware, and Sam Brindley from New York Life Insurance shared their journey of adopting AWS with New Relic, focusing on the concept of the observable enterprise.
- Key objectives discussed were building trust, reducing noise, and increasing coverage through integration, automation, and governance.
- The session covered the importance of early decision-making in cloud transformations, the need for accurate assessments, and post-migration visibility.
- New Relic's partnership with AWS and strategic partners like CloudAware was highlighted as crucial for customer success.
- The talk included a technical demonstration of using Terraform for alerting as code, enabling developers to autonomously manage and monitor their AWS resources with New Relic.
Insights
- Observability is not just about monitoring but about providing actionable insights and confidence to make changes without causing downtime.
- Modern cloud transformations require a departure from one-size-fits-all approaches, emphasizing the need for tailored solutions and early planning.
- The concept of the observable enterprise is central to managing large-scale cloud environments, with a focus on trust, noise reduction, and coverage.
- Integration, automation, and governance are key pillars in achieving an observable enterprise, with tools like CloudAware playing a pivotal role.
- The shift-left approach empowers developers to take ownership of monitoring and alerting, reducing bureaucracy and enabling scalability.
- New Relic's Terraform integration allows for alerting as code, providing developers with the tools to manage monitoring and alerting in a self-service manner.
- The cost implications of monitoring strategies were discussed, with a comparison between pull (CloudWatch API) and push (Kinesis stream) methods.
- The session underscored the importance of communication with business stakeholders to build trust and ensure they understand the state of IT operations.