Title

AWS re:Invent 2023 - How not to practice observability (DOP404)

Summary

Anand from ManageEngine, a division of Zoho Corporation, discusses common pitfalls in implementing observability.
Observability is proactive and relies on historical data, unlike reactive monitoring.
Quality of observability improves with the right data sampling, not just more data.
Misconceptions about observability can lead to issues like overprovisioning and missing critical spikes in metrics.
Creating dashboards should be done thoughtfully to avoid technical debt and ensure they address frequently referred issues.
Assumptions can lead to incomplete observability, missing out on capturing all layers of an application.
Misconfigurations in alerting can lead to alert fatigue and unnecessary costs.
DevOps teams should avoid centralizing configurations and instead tailor observability practices to specific applications.
Data hoarding and access restrictions can hinder effective observability.
Platform engineering is emerging to address data unification challenges.
Observability systems need failover mechanisms and should not cause system crashes.
Knowledge transfer between shifts is crucial to avoid reinventing the wheel.
Adopting new tools requires internal changes and should not be done just for the sake of using new technology.
ManageEngine offers tools for observability and invites attendees to visit their booth for more insights.

Observability is a complex field that requires a balance between proactive data analysis and avoiding information overload.
The right sampling rate is crucial for accurate observability, as both under-sampling and over-sampling can lead to misinterpretation of system health.
Dashboard creation is a skill gap in many organizations, and dashboards should be created with a clear purpose and regular usage in mind.
There is a risk of assuming that if individual parts of a system are fine, the whole system is fine, which can lead to missing systemic issues.
Alerting configurations should be optimized to reduce noise and prevent alert fatigue among engineers.
Decentralizing observability configurations can empower teams to tailor observability to their specific needs, avoiding a one-size-fits-all approach.
Data accessibility and cross-team observability are essential for quick incident resolution.
Platform engineering is becoming important for managing data across various tools and ensuring a unified view of observability data.
Observability systems themselves need to be robust and not contribute to system instability.
When adopting new tools, it's important to consider the people and processes involved, not just the capabilities of the tool itself.
ManageEngine's experience with observability across a wide range of products and customers positions them as a knowledgeable entity in the field.