Title
AWS re:Invent 2023 - How to build a platform that developers love (DOP219)
Summary
- Jason Hand, a senior developer advocate at Data Dog, introduces the session on platform engineering with Ross from Starbucks.
- Ross discusses the evolution of Starbucks' platform engineering in three phases: Origin, Evolution, and Expansion.
- The Origin phase involved learning to manage infrastructure in code with bespoke configurations and tooling.
- The Evolution phase focused on automating components, introducing an SRE team, and asking long-term questions about support and best practices.
- The Expansion phase is about scaling, refining processes, and empowering application teams with tools like a CLI for interacting with Kubernetes and Vault.
- Guiding principles for Starbucks' platform engineering include a strong starting point, leading by example, and long-term aspirations.
- Starbucks' platform is designed to be secure by design, with a shared ownership model between platform teams and application teams.
- The platform team is responsible for the platform's existence and uptime, while application teams own their runtime and utilization of platform resources.
- Efforts to reduce pain for developers include tenant-facing improvements, automation, and defined processes for operations.
- Onboarding new tenants is a key focus, with initial setup meetings, demo environments, and templated monitoring.
- Observability is integral, with Datadog being the tool of choice for monitoring and alerting.
- The future of Starbucks' platform includes cost optimization and leveraging Datadog's features for this purpose.
- Ross concludes that DevOps is not dead, but platform engineering builds upon its principles, emphasizing ownership, tools for observability, and community support.
Insights
- Platform engineering is gaining traction in DevOps circles, focusing on enabling engineers to be more self-reliant and efficient.
- Starbucks has a significant technology organization that supports a wide range of backend operations, not just coffee sales.
- The transition from hand-built infrastructure to automation is critical for scalability and manageability.
- The introduction of an SRE team at Starbucks acted as a bridge between platform engineers and tenants, focusing on support and knowledge transfer.
- Ownership is a central theme, with the belief that teams taking full responsibility for their services will provide better support and improvements.
- Community forums and Slack channels are used for support and knowledge sharing, reducing reliance on the platform team and fostering a community of users.
- Observability and monitoring are essential for platform engineering, with Datadog being a key partner for Starbucks in this area.
- The shared responsibility model, inspired by AWS, clearly defines the roles of the platform team and application teams, which helps prevent silos by establishing ownership.
- Continuous improvement, a blameless culture, and openness to new ideas are part of Starbucks' core philosophy for platform engineering.
- Learning from failures and iterating on processes is a recurring theme, highlighting the importance of adaptability and feedback in platform development.