Title
AWS re:Invent 2022 - Build resilient architectures with Amazon EBS (STG219)
Summary
- Camen Tavares, Director of Product for Elastic Block Store (EBS), and Mark Wilson, Senior Principal Engineer at EBS, presented on building resilient architectures with Amazon EBS.
- They discussed the importance of understanding risks, minimizing the scope of impact, and expediting recovery to ensure application availability.
- The speakers highlighted the inevitability of failures and the need to design systems that can accommodate and minimize the impact of such failures.
- They emphasized the importance of Availability Zones (AZs) in AWS and how EBS has been designed to be a zonal service to leverage AZ independence.
- The talk covered the evolution of EBS, including its control plane and data plane, and how AWS has improved the service's resiliency and reduced blast radius.
- They introduced the concept of static stability in system design, which allows systems to return to a steady state after a disturbance.
- The speakers provided insights into how customers can build resiliency into their systems using AWS services and features, such as choosing the right EBS volume type, using EBS Snapshots, architecting across AZs, monitoring with CloudWatch, testing recovery procedures, and planning for recovery.
- They concluded with examples of how these principles can be applied in practice, using Amazon RDS as a reference architecture and discussing strategies for migrating legacy applications to the cloud.
Insights
- Murphy's Law in System Design: The talk reinforced the principle that "stuff breaks," and designing for failure is not just a best practice but a necessity in cloud architecture.
- Blast Radius Reduction: AWS's focus on reducing the blast radius is a critical aspect of their resiliency strategy, ensuring that failures have minimal impact on customers.
- Zonal Independence: The emphasis on AZ independence is a key strategy for AWS, allowing customers to build resilient architectures that can withstand AZ-level disruptions.
- EBS Evolution: The evolution of EBS from a monolithic control plane to a more distributed and resilient architecture showcases AWS's commitment to continuous improvement and scalability.
- Static Stability: The concept of static stability, borrowed from aviation, is an interesting approach to designing systems that can self-correct after failures, reducing the need for human intervention.
- Shared Responsibility Model: The talk highlighted that while AWS provides the infrastructure and tools for resiliency, customers also have a role in implementing and managing their resiliency strategies.
- Recovery Planning: The importance of having a clear failover plan and the willingness to execute it was stressed, suggesting that customers should be proactive rather than reactive in their approach to recovery.
- Legacy Applications: The discussion on migrating legacy applications to the cloud provided practical insights into how businesses can leverage cloud benefits for applications that were not originally designed for cloud environments.