Title
AWS re:Invent 2022 - Observability: Best practices for modern applications (COP344)
Summary
- Roland Barcia and Greg Apple presented on observability best practices for modern applications.
- Modern apps are more difficult to observe due to their distributed nature, use of various technologies, and microservices architecture.
- Observability should be considered a day zero problem, and modern apps should be built to be observed.
- The session covered four best practices:
- Navigating instrumentation options.
- Optimizing the cost of high cardinality.
- Reducing alarm fatigue.
- Avoiding dangling traces.
- AWS services have varying levels of support for tracing, and it's important to understand how to propagate traces across services.
- OpenTelemetry is recommended for metrics and traces, and other tools for logs until OpenTelemetry supports logs in GA.
- CloudWatch Embedded Metric Format can help manage high cardinality and cost.
- Synthetic testing, machine learning, and alarm correlation can reduce alarm fatigue.
- Instrumentation of code is necessary for tracing, and trace context must be passed across service boundaries to avoid dangling traces.
- The session included hands-on examples and demos.
Insights
- The shift from monolithic to microservices architecture has significantly increased the complexity of observability.
- Observability is not just about monitoring; it's about understanding the full lifecycle of logs, metrics, and traces within a system.
- The use of various AWS services (Lambda, ECS, EKS, ROSA) and technologies (containers, serverless functions) requires a nuanced approach to observability.
- AWS provides native services and support for popular open-source tools for observability, catering to different customer strategies.
- The AWS Distro for OpenTelemetry supports metrics and traces, and logs are expected to be supported in the future.
- CloudWatch Embedded Metric Format is a powerful feature for managing telemetry data efficiently and cost-effectively.
- Alarm management is crucial in modern applications to avoid alert fatigue and ensure that alarms are meaningful and actionable.
- Tracing is complex and requires careful planning to ensure end-to-end visibility, especially when dealing with services that do not natively support tracing.
- The session emphasized the shared responsibility model in AWS, where both AWS and customers must take part in the instrumentation for effective observability.
- The presenters provided resources such as workshops, GitHub pages, and skill builders for further learning and implementation of observability best practices.