Title
AWS re:Invent 2022 - [NEW] Visibility into how internet issues impact app performance (COP345)
Summary
- Introduction: The session, led by Ritchie, focuses on CloudWatch Internet Monitor, a tool for network engineers and application developers to gain visibility into internet issues affecting application performance.
- CloudWatch Overview: A refresher on AWS CloudWatch as the native observability platform with logs, metrics, and traces. It includes tools like Service Lens, X-Ray, Log Insights, and Resource Manager, as well as end-user experience tools like Synthetics, Evidently, and real user monitoring.
- Customer Challenges: Common issues include discrepancies between monitoring dashboards and actual customer experiences, difficulty in diagnosing internet-related performance problems, and the high cost of internet traffic monitoring.
- Amazon CloudWatch Internet Monitor: A new feature that provides insights into internet performance issues affecting applications, with recommendations for service improvements and traffic routing. It offers a global view of traffic patterns, health events, and integrates with CloudWatch Logs and Amazon EventBridge.
- Demo: Ritchie demonstrates creating a monitor, viewing health scores, analyzing health events, and exploring traffic insights. The demo shows how to use the tool to improve end-user experience and make architectural decisions.
- Behind the Scenes: Harvo Jones discusses the development of CloudWatch Internet Monitor, the challenges of scale, and the importance of presenting data in an understandable way. He explains the focus on availability and round trip time as key measurements and the process of filtering and analyzing data to provide relevant insights to customers.
- EA Sports Case Study: Peter Vido shares how EA Sports uses Internet Monitor to gain visibility into internet performance issues affecting their gaming applications, leading to better decision-making and improved player experiences.
Insights
- CloudWatch Internet Monitor's Value: The tool significantly reduces the time required to diagnose internet-related issues, from days to minutes, by providing actionable insights without the need for additional coding or configuration.
- Scale and Complexity: The development of the Internet Monitor faced the challenge of handling the vast scale of the internet, including billions of connected users, thousands of autonomous systems, and a multitude of routes and cities. The team addressed this by focusing on key measurements and aggregating data to present a personalized view for each customer.
- Customer-Centric Development: Feedback from customers like EA Sports, Stripe, Tech Mahindra, and Smado was crucial in shaping the product to ensure it met the needs of businesses relying on internet delivery.
- Trade-offs in Data Analysis: The team had to balance precision and recall in their anomaly detection and health event predictions, ensuring that the alerts provided to customers were both accurate and comprehensive.
- Impact on Business Operations: For companies like EA Sports, CloudWatch Internet Monitor provides a broader perspective on internet health, enabling them to make informed decisions, evaluate application performance, and reduce the mean time to detect and identify issues affecting their services.