Title
AWS re:Invent 2022 - Building resilient networks (NET306)
Summary
- Speakers: Kyle Tedeschi, Principal Solutions Architect, and Scott Morrison, Networking Specialist Solutions Architect at AWS.
- Key Topics: Network resilience on AWS, shared responsibility model, CAP theorem, fault domains, AWS Hyperplane, NAT Gateway, Transit Gateway, Load Balancers (ALB, NLB, Gateway LB), multi-region resilience, inter-region routing, traffic routing options (Route 53, CloudFront, Global Accelerator), hybrid network resilience (Direct Connect, Site-to-Site VPN).
- Resilience Theory: Importance of redundancy, trade-offs in distributed systems (CAP theorem), and cost/complexity considerations for different workload tiers.
- Single Region Resilience: Use of AWS Hyperplane, deployment of NAT Gateways, Transit Gateways, and Load Balancers in each active AZ, and best practices for load balancing and DNS.
- Multi-Region Resilience: Inter-region routing with VPC peering, Transit Gateway peering, and Cloud WAN. Traffic routing options include Route 53, CloudFront, and AWS Global Accelerator.
- Hybrid Network Resilience: Direct Connect configurations for high and maximum resilience, use of BFD for quick failover, and the importance of baselining and monitoring network performance.
Insights
- Shared Responsibility Model: AWS ensures the resilience of the underlying infrastructure, while customers are responsible for architecting their applications to leverage AWS's fault isolation zones.
- CAP Theorem: Customers must make trade-offs between consistency, availability, and partition tolerance when designing their systems on AWS.
- Fault Domains: AWS has multiple fault domains, including regions, availability zones, and flows. Understanding these domains is crucial for building resilient networks.
- AWS Hyperplane: A key internal service that supports many AWS networking services, providing fault tolerance and high availability within an AZ.
- Load Balancers: Different types of AWS Load Balancers (ALB, NLB, Gateway LB) have specific resilience characteristics and best practices for deployment.
- Multi-Region Architectures: For applications requiring multi-region deployment, AWS provides several traffic routing options, each with its own resilience features.
- Hybrid Networks: Direct Connect and Site-to-Site VPN are critical for hybrid network resilience, with specific configurations recommended for different resilience levels.
- Monitoring and Baseline: Continuous monitoring and establishing baselines for network performance are essential for troubleshooting and ensuring network resilience.