Title
AWS re:Invent 2022 - How Riot Games processes 20 TB of analytics data daily on AWS (ANT341)
Summary
- Riot Games, known for League of Legends, processes significant amounts of data for their games and esports events.
- Initially, Riot used an on-prem data center with a system called Honu for telemetry data collection, which was schema-less and had its own set of challenges.
- As Riot prepared for new games, they moved to AWS infrastructure, creating a new analytics platform (AP) with enforced schema and Kafka for data ingestion.
- Honu was later updated to use Kafka and AWS services, improving performance and reducing the need for batch jobs.
- Riot's current state involves a unified event bus architecture called Rebus, which allows for regional processing and better event-driven capabilities.
- The Rebus architecture uses Kafka, a schema registry, and supports unique topics for streaming capabilities.
- Riot's future plans include onboarding more teams onto Rebus, consolidating pipelines, improving data warehousing, investigating regional data lakes, and increasing investment in ML Ops.
Insights
- Riot Games has evolved its data architecture over time to handle increasing volumes of data and to provide better services to both developers and players.
- The transition from on-prem to cloud-based infrastructure on AWS allowed Riot to leverage more robust and scalable services like Kafka and AWS Managed Services.
- The move from a schema-less to a schema-enforced system helped improve data quality and usability for downstream consumers like data scientists and analysts.
- Riot's Rebus architecture represents a significant step towards a more modern, event-driven approach, enabling better scalability, isolation, and regional data processing.
- The company's focus on regional data processing and ML Ops indicates a commitment to complying with data residency requirements and enhancing their machine learning capabilities for improved game experiences.
- Riot's iterative approach to architecture, willingness to learn, and problem-solving mindset have been key to their success in managing large-scale data workloads on AWS.