Title
AWS re:Invent 2023 - How not to practice observability (DOP404)
Summary
- Anand from ManageEngine, a division of Zoho Corporation, discusses common pitfalls in implementing observability.
- Observability is proactive and relies on historical data, unlike reactive monitoring.
- Quality of observability improves with the right data sampling, not just more data.
- Misconceptions about observability can lead to issues like overprovisioning and missing critical spikes in metrics.
- Creating dashboards should be done thoughtfully to avoid technical debt and ensure they address frequently referred issues.
- Assumptions can lead to incomplete observability, missing out on capturing all layers of an application.
- Misconfigurations in alerting can lead to alert fatigue and unnecessary costs.
- DevOps teams should avoid centralizing configurations and instead tailor observability practices to specific applications.
- Data hoarding and access restrictions can hinder effective observability.
- Platform engineering is emerging to address data unification challenges.
- Observability systems need failover mechanisms and should not cause system crashes.
- Knowledge transfer between shifts is crucial to avoid reinventing the wheel.
- Adopting new tools requires internal changes and should not be done just for the sake of using new technology.
- ManageEngine offers tools for observability and invites attendees to visit their booth for more insights.
Insights
- Observability is a complex field that requires a balance between proactive data analysis and avoiding information overload.
- The right sampling rate is crucial for accurate observability, as both under-sampling and over-sampling can lead to misinterpretation of system health.
- Dashboard creation is a skill gap in many organizations, and dashboards should be created with a clear purpose and regular usage in mind.
- There is a risk of assuming that if individual parts of a system are fine, the whole system is fine, which can lead to missing systemic issues.
- Alerting configurations should be optimized to reduce noise and prevent alert fatigue among engineers.
- Decentralizing observability configurations can empower teams to tailor observability to their specific needs, avoiding a one-size-fits-all approach.
- Data accessibility and cross-team observability are essential for quick incident resolution.
- Platform engineering is becoming important for managing data across various tools and ensuring a unified view of observability data.
- Observability systems themselves need to be robust and not contribute to system instability.
- When adopting new tools, it's important to consider the people and processes involved, not just the capabilities of the tool itself.
- ManageEngine's experience with observability across a wide range of products and customers positions them as a knowledgeable entity in the field.