Title
AWS re:Invent 2022 - Use data analytics to improve product quality & customer satisfaction (STG222)
Summary
- Seth Markle, a senior principal engineer in S3, discusses Amazon's approach to building services, focusing on S3's architecture, performance, and the importance of reducing variance in latency.
- S3 is composed of over 235 microservices, houses over 280 trillion objects, and averages around 100 million requests per second.
- The request path in S3, which handles get and put requests, is managed by about a dozen microservices owned by "two pizza teams."
- Performance in S3 is measured in terms of latency, throughput, and request rates, with a focus on reducing the variance in latency.
- Seth explains the sources of latency, including network hops, storage media, computational overhead, and queuing.
- A data analysis pipeline is used to analyze performance logs from microservices to improve end-to-end performance and manage the ingestion of new hard drive technologies.
- Guest speakers from Coupang, Sesshu, and Aldi, discuss how Coupang uses data analytics and A/B testing to enhance customer experience.
- Coupang has 18 million users and uses a data lake of 32 petabytes and a data warehouse of two petabytes to run 120,000 data processing jobs and capture 7.1 million logs per minute.
- The experimentation platform at Coupang allows for running multiple versions of customer experiences to determine the best options based on data analysis.
- Aldi explains Coupang's A/B testing architecture and the importance of designing experiments, including hypothesis formulation and success metrics.
- The talk concludes with lessons learned, emphasizing the importance of planning for resumability, considering serverless options first, and starting small to find unexpected insights.
Insights
- The architecture of S3 is designed to maintain a balance between independent operation of microservices and an end-to-end view of performance, which is critical for customer satisfaction.
- The variance in latency is a significant factor for data lake workloads, and Amazon uses a sophisticated data analysis pipeline to identify and reduce performance outliers.
- Coupang's use of data analytics and A/B testing is a prime example of how large-scale e-commerce platforms can leverage data to improve customer experience and make informed decisions.
- The experimentation platform at Coupang demonstrates the importance of a robust design phase in A/B testing, which includes defining clear hypotheses and success metrics.
- The evolution of Coupang's metrics computation from incremental to non-incremental and finally to a hybrid approach with state retention highlights the need for scalable, fault-tolerant systems in data analytics.
- The common themes identified by Seth Markle, such as planning for resumability, going serverless first, and starting small, are valuable insights for any organization looking to leverage data analytics for performance improvement and product quality.