Title

AWS re:Invent 2023 - Scaling Warhammer 40,000: Darktide from 0 to 100,000 players in 1 hour (GAM305)

Summary

Presenters: Joris van der Donk (AWS Solutions Architect) and Andrew Klawich (Technical Director at Fatshark).
Topic: Scaling the game Warhammer 40,000: Darktide on AWS to support over 100,000 concurrent players within an hour of launch.
Fatshark's Background: A game studio from Stockholm, Sweden, known for cooperative gameplay experiences and the Warhammer game series.
Technical Challenges: Creating a scalable, server-authoritative, cost-effective architecture for Darktide, ensuring consistent player experience, and managing latency.
Solutions Implemented:
- Login Queue: Utilized AWS Lambda and ElastiCache for Redis to handle spikes in login requests.
- Immaterium Service: A Java-based service using gRPC and Redis for party management and player presence.
- Matchmaking and Game Server Allocation: Amazon GameLift FlexMatch for matchmaking and GameLift FleetIQ for managing EC2 instances and game session placement.
- Global Accelerator: To optimize network paths and reduce latency for players worldwide.
Results: Successful scaling to 30,000 vCPU across regions, handling millions of enemies in-game, and maintaining a good player experience.
Lessons Learned: Emphasizing serverless architecture, managed services, and observability; using jitter to avoid traffic spikes; and leveraging EC2 Spot Instances for cost savings.
Future Plans: Expand server locations, improve CPU budgeting for game servers, and explore AWS Local Zones and new regions.

Insights

Serverless First Approach: Starting with serverless services like Lambda allowed Fatshark to rapidly prototype and scale, with the flexibility to move to other services like Fargate or EC2 if needed.
Managed Services: Fatshark's reliance on AWS managed services like DynamoDB, MemoryDB, and Aurora Serverless 2 helped them focus on game development rather than infrastructure management.
Cost-Effective Scaling: The use of EC2 Spot Instances and careful right-sizing of architecture were key strategies for cost savings during the game's launch.
Traffic Management: Implementing a login queue with jitter and spreading player traffic over time were effective in managing the massive influx of players at launch.
Global Deployment: The deployment of game servers in over 12 regions and the use of AWS Global Accelerator ensured low latency and a good player experience globally.
Observability and Monitoring: Tools like Honeycomb were crucial for identifying and resolving backend performance issues and bugs.
Future Enhancements: Plans to use AWS Local Zones and new regions to bring servers closer to players, and to refine CPU budgeting for game servers to handle variable workloads more efficiently.

Scaling the Platform That Issues 70 of Us Mortgage Securities Wps210 Schedule Work with Amazon Eventbridge Scheduler Api208