Resilience Lifecycle a Mental Model for Resilience on Aws Arc312

Title

AWS re:Invent 2023 - Resilience lifecycle: A mental model for resilience on AWS (ARC312)

Summary

  • Clark Ritchie, a principal technologist at AWS, introduces the AWS resilience lifecycle, emphasizing the importance of resilience for maintaining revenue and trust with customers.
  • Unplanned downtime costs Fortune 1000 companies billions in revenue, not to mention intangible costs like reputation damage.
  • AWS's shared responsibility model for resilience is highlighted, with AWS responsible for the cloud's resilience and customers responsible for resilience in the cloud.
  • The AWS resilience lifecycle is presented as a journey, not a destination, with foundational resilience and continual resilience as key concepts.
  • The lifecycle includes five phases: setting objectives, designing and implementing, responding and learning, evaluating and testing, and operating.
  • AWS foundational services support each phase, with tools like AWS Resilience Hub, AWS Well-Architected Framework, and AWS Trusted Advisor.
  • Stacey Brown and Yoni from Vanguard share their company's approach to resilience, detailing their journey from reactive to proactive and ingrained resilience practices.
  • Vanguard's focus on observability, performance testing, chaos engineering, and culture change has led to a 5x increase in deliveries and a 30% reduction in major incidents.
  • The session concludes with an invitation to access the AWS Resilience Lifecycle White Paper via a QR code and an offer to answer questions.

Insights

  • Resilience is critical for businesses to avoid revenue loss and maintain customer trust, especially for applications that are crucial to an organization's operations.
  • The AWS shared responsibility model underscores the importance of customers actively engaging in designing resilient systems, as AWS ensures the underlying infrastructure's resilience.
  • The AWS resilience lifecycle is an iterative process that aligns with typical software development lifecycles, making it easier for organizations to integrate resilience into their existing practices.
  • The lifecycle's five phases are interconnected, with each phase providing feedback that informs the next, ensuring continuous improvement in resilience.
  • AWS provides a suite of services and tools to support each phase of the resilience lifecycle, helping customers design, implement, test, and operate resilient systems.
  • Vanguard's case study illustrates the practical application of the AWS resilience lifecycle, showing how a large organization can transform its approach to resilience and achieve significant improvements in system reliability and operational efficiency.
  • The session highlights the importance of a cultural shift towards resilience, with a focus on empowering engineers and developers to adopt resilience practices and tools as part of their regular workflow.
  • The AWS Resilience Lifecycle White Paper serves as a comprehensive guide for organizations looking to improve their resilience on AWS, offering detailed insights and best practices.