How to Build a Platform That Developers Love Dop219

Title

AWS re:Invent 2023 - How to build a platform that developers love (DOP219)

Summary

  • Jason Hand, a senior developer advocate at Data Dog, introduces the session on platform engineering with Ross from Starbucks.
  • Ross discusses the evolution of Starbucks' platform engineering in three phases: Origin, Evolution, and Expansion.
  • The Origin phase involved learning to manage infrastructure in code with bespoke configurations and tooling.
  • The Evolution phase focused on automating components, introducing an SRE team, and asking long-term questions about support and best practices.
  • The Expansion phase is about scaling, refining processes, and empowering application teams with tools like a CLI for interacting with Kubernetes and Vault.
  • Guiding principles for Starbucks' platform engineering include a strong starting point, leading by example, and long-term aspirations.
  • Starbucks' platform is designed to be secure by design, with a shared ownership model between platform teams and application teams.
  • The platform team is responsible for the platform's existence and uptime, while application teams own their runtime and utilization of platform resources.
  • Efforts to reduce pain for developers include tenant-facing improvements, automation, and defined processes for operations.
  • Onboarding new tenants is a key focus, with initial setup meetings, demo environments, and templated monitoring.
  • Observability is integral, with Datadog being the tool of choice for monitoring and alerting.
  • The future of Starbucks' platform includes cost optimization and leveraging Datadog's features for this purpose.
  • Ross concludes that DevOps is not dead, but platform engineering builds upon its principles, emphasizing ownership, tools for observability, and community support.

Insights

  • Platform engineering is gaining traction in DevOps circles, focusing on enabling engineers to be more self-reliant and efficient.
  • Starbucks has a significant technology organization that supports a wide range of backend operations, not just coffee sales.
  • The transition from hand-built infrastructure to automation is critical for scalability and manageability.
  • The introduction of an SRE team at Starbucks acted as a bridge between platform engineers and tenants, focusing on support and knowledge transfer.
  • Ownership is a central theme, with the belief that teams taking full responsibility for their services will provide better support and improvements.
  • Community forums and Slack channels are used for support and knowledge sharing, reducing reliance on the platform team and fostering a community of users.
  • Observability and monitoring are essential for platform engineering, with Datadog being a key partner for Starbucks in this area.
  • The shared responsibility model, inspired by AWS, clearly defines the roles of the platform team and application teams, which helps prevent silos by establishing ownership.
  • Continuous improvement, a blameless culture, and openness to new ideas are part of Starbucks' core philosophy for platform engineering.
  • Learning from failures and iterating on processes is a recurring theme, highlighting the importance of adaptability and feedback in platform development.