Operating Kubernetes Clusters at Cloud Scale Con330

Title

AWS re:Invent 2022 - Operating Kubernetes clusters at cloud scale (CON330)

Summary

  • Speakers: Vipul (Senior Manager for EKS) and Shyam (Kubernetes expert and co-chair of Sig Scalability Upstream).
  • EKS Overview: Amazon EKS is a managed Kubernetes service that aims to simplify Kubernetes operations, ensuring security, reliability, and scalability.
  • Kubernetes on AWS: 65% of Kubernetes users run their containers on AWS, including both EKS and self-managed Kubernetes.
  • EKS Principles: Fully upstream, certified, supports four versions of Kubernetes, and backports important patches.
  • EKS Availability: Present in most AWS regions and operates hundreds of thousands of clusters globally.
  • EKS Focus Areas: Security, availability, durability, reliability, scalability, performance, and efficiency.
  • Security: Isolation with private VPCs, supply chain security, compliance with standards, and security controls.
  • Availability: 99.95% SLA, AZ-redundant architecture, and support for webhook issues.
  • Durability: Focus on etcd management, including backups and persistent volumes.
  • Reliability: Extensive testing of Kubernetes versions, learning from customer experiences, and upstream contributions.
  • Scalability: Pushing boundaries with scale tests, resolving issues, and aiming to increase the number of nodes supported in tests.
  • Performance: SLOs for pod startup times and API latencies, and improvements benefiting all Kubernetes users.
  • Efficiency: Right-sizing clusters based on workload demand without compromising availability.
  • Innovations: Improvements in zonal redundancy, control plane right-sizing, software delivery, and etcd management.
  • Software Delivery: Testing at multiple levels, reducing update times, and in-place container updates for critical security issues.
  • Resources: Documentation for getting started and best practices, public roadmap for feature requests, and additional sessions on EKS and containers.

Insights

  • EKS is a critical part of AWS's container strategy, with a significant portion of Kubernetes users opting for AWS as their container platform.
  • The EKS team is deeply involved in the Kubernetes community, contributing to upstream projects and participating in security working groups.
  • EKS's architecture is designed for high availability and security, with each cluster control plane placed in its own VPC and redundancy across AZs.
  • EKS is continually innovating to improve the customer experience, with a focus on making Kubernetes operations as "boring" and simple as possible.
  • The EKS team is pushing the limits of Kubernetes scalability, working on increasing the number of nodes supported in tests and resolving scalability issues.
  • EKS's software delivery process is sophisticated, involving multiple levels of testing and a focus on reducing the time to release updates and fixes.
  • AWS encourages customer feedback and participation, as seen in their public roadmap and documentation resources, which help guide the development of EKS features.
  • The session highlights the complexity of operating Kubernetes at scale and the efforts AWS is making to abstract these complexities away from the customer, allowing them to focus on their applications rather than infrastructure management.