Title
AWS re:Invent 2023 - Amazon Aurora HA and DR design patterns for global resilience (DAT324)
Summary
- The session focused on building resilient systems using Amazon Aurora, emphasizing high availability (HA) and disaster recovery (DR) patterns.
- Resilience is defined as the ability to recover from disruptions, dynamically acquire resources, and mitigate issues like network problems.
- Availability and disaster recovery are the two pillars of resilience, with availability measured in nines (e.g., 99.99% uptime) and disaster recovery focusing on recovery time objective (RTO) and recovery point objective (RPO).
- Aurora separates storage from compute, with storage distributed across three availability zones for durability.
- Aurora provides continuous backup to S3, allowing for point-in-time recovery within a retention window.
- Volume Clone feature allows for creating test environments with production-like data without impacting performance or doubling costs.
- Multi-AZ deployments improve availability without affecting durability by adding additional database instances in separate AZs.
- Read replicas can be used to offload read-only queries and scale out read performance.
- Global database replication allows for asynchronous replication across regions, improving RPO.
- Write forwarding enables applications to perform writes in read-only regions by forwarding them to the primary region.
- AWS Backup simplifies cross-region backup and replication processes.
- Global DBRPO parameter in Aurora Postgres can manage replication lag and ensure data is within a bounded lag across regions.
- Account-level resilience can be achieved by copying backups to a separate AWS account.
Insights
- Aurora's design separates storage and compute, which underpins many HA and DR features, such as continuous backup and fast recovery.
- The ability to create volume clones for testing or batch processing can significantly enhance the resilience of production systems without incurring additional costs.
- Multi-AZ deployments and read replicas are key strategies for achieving high availability and scaling read performance in Aurora.
- Global database replication is crucial for achieving low RPO and ensuring data durability across multiple regions.
- Write forwarding is a powerful feature that allows for global deployment of applications with read-write capabilities without complex application changes.
- The Global DBRPO parameter is a sophisticated feature for applications with high-value transactions, ensuring that replication lag is within a defined threshold.
- Cross-account backup strategies provide an additional layer of resilience against account-specific issues, ensuring business continuity.
- The session highlighted the importance of testing resilience strategies, such as using the switchover command to simulate region failovers and ensure the application can handle such events.