Title
AWS re:Invent 2022 - Data resiliency design patterns with AWS (STG206)
Summary
- Speakers: Rajesh Vijayaragwan, Principal Business Development Manager for AWS Storage, and Jay Rowlett, Senior Principal Engineer with AWS file, edge, and data services.
- Key Topics: Data resiliency, high availability, disaster recovery, shared responsibility model, AWS storage services, customer personas, backup strategies, and disaster recovery strategies.
- Data Resiliency: The ability of applications to withstand failures and recover from unexpected conditions.
- High Availability: Preventing loss of service, dealing with frequent events like component failures, and ensuring graceful degradation of services.
- Disaster Recovery: Recovering from loss of service with key measures being Recovery Time Objective (RTO) and Recovery Point Objective (RPO).
- Shared Responsibility Model: AWS is responsible for the infrastructure resiliency, while customers are responsible for application resiliency in the cloud.
- AWS Storage Services: AWS offers a range of storage services like S3, DynamoDB, EFS, AWS Backup, AWS Elastic Disaster Recovery, and AWS Resilience Hub.
- Customer Personas: On-premises backup users, hybrid customers, and digitally native customers.
- Disaster Recovery Strategies: Backup and restore, pilot light, warm standby, and multi-site active-active strategies.
- Tools and Services: AWS Resilience Hub, AWS Fault Injection Simulator, and game days for testing organizational readiness.
Insights
- Data Resiliency vs. High Availability vs. Disaster Recovery: These terms are often confused but have distinct meanings. Data resiliency encompasses the overall ability to handle failures, high availability focuses on preventing service interruptions, and disaster recovery is about restoring service after significant disruptions.
- Importance of Testing: It's crucial to regularly test disaster recovery plans and backup systems to ensure they work as intended. AWS provides tools like the Resilience Hub and Fault Injection Simulator to facilitate this.
- Cost vs. Complexity: There's a trade-off between the cost of implementing a disaster recovery strategy and the complexity and speed of recovery. Strategies range from simple backup and restore to more complex and costly multi-region active-active setups.
- Serverless for Resiliency: Serverless architectures can simplify the implementation of resilient systems by reducing the operational burden and allowing for easy scaling and failover.
- Organizational Engagement: Data resiliency is not just a technical issue but requires involvement from the entire organization, including business and legal teams, to set appropriate strategies and ensure compliance.
- Continuous Learning: AWS encourages ongoing learning and skill development in storage and data resiliency through resources like AWS Skill Builder, ramp-up guides, and training programs.