Title

AWS re:Invent 2022 - Observability: Best practices for modern applications (COP344)

Summary

Roland Barcia and Greg Apple presented on observability best practices for modern applications.
Modern apps are more difficult to observe due to their distributed nature, use of various technologies, and microservices architecture.
Observability should be considered a day zero problem, and modern apps should be built to be observed.
The session covered four best practices:
1. Navigating instrumentation options.
2. Optimizing the cost of high cardinality.
3. Reducing alarm fatigue.
4. Avoiding dangling traces.
AWS services have varying levels of support for tracing, and it's important to understand how to propagate traces across services.
OpenTelemetry is recommended for metrics and traces, and other tools for logs until OpenTelemetry supports logs in GA.
CloudWatch Embedded Metric Format can help manage high cardinality and cost.
Synthetic testing, machine learning, and alarm correlation can reduce alarm fatigue.
Instrumentation of code is necessary for tracing, and trace context must be passed across service boundaries to avoid dangling traces.
The session included hands-on examples and demos.

The shift from monolithic to microservices architecture has significantly increased the complexity of observability.
Observability is not just about monitoring; it's about understanding the full lifecycle of logs, metrics, and traces within a system.
The use of various AWS services (Lambda, ECS, EKS, ROSA) and technologies (containers, serverless functions) requires a nuanced approach to observability.
AWS provides native services and support for popular open-source tools for observability, catering to different customer strategies.
The AWS Distro for OpenTelemetry supports metrics and traces, and logs are expected to be supported in the future.
CloudWatch Embedded Metric Format is a powerful feature for managing telemetry data efficiently and cost-effectively.
Alarm management is crucial in modern applications to avoid alert fatigue and ensure that alarms are meaningful and actionable.
Tracing is complex and requires careful planning to ensure end-to-end visibility, especially when dealing with services that do not natively support tracing.
The session emphasized the shared responsibility model in AWS, where both AWS and customers must take part in the instrumentation for effective observability.
The presenters provided resources such as workshops, GitHub pages, and skill builders for further learning and implementation of observability best practices.