Building Resilient Networks Net306

Title

AWS re:Invent 2022 - Building resilient networks (NET306)

Summary

  • Speakers: Kyle Tedeschi, Principal Solutions Architect, and Scott Morrison, Networking Specialist Solutions Architect at AWS.
  • Key Topics: Network resilience on AWS, shared responsibility model, CAP theorem, fault domains, AWS Hyperplane, NAT Gateway, Transit Gateway, Load Balancers (ALB, NLB, Gateway LB), multi-region resilience, inter-region routing, traffic routing options (Route 53, CloudFront, Global Accelerator), hybrid network resilience (Direct Connect, Site-to-Site VPN).
  • Resilience Theory: Importance of redundancy, trade-offs in distributed systems (CAP theorem), and cost/complexity considerations for different workload tiers.
  • Single Region Resilience: Use of AWS Hyperplane, deployment of NAT Gateways, Transit Gateways, and Load Balancers in each active AZ, and best practices for load balancing and DNS.
  • Multi-Region Resilience: Inter-region routing with VPC peering, Transit Gateway peering, and Cloud WAN. Traffic routing options include Route 53, CloudFront, and AWS Global Accelerator.
  • Hybrid Network Resilience: Direct Connect configurations for high and maximum resilience, use of BFD for quick failover, and the importance of baselining and monitoring network performance.

Insights

  • Shared Responsibility Model: AWS ensures the resilience of the underlying infrastructure, while customers are responsible for architecting their applications to leverage AWS's fault isolation zones.
  • CAP Theorem: Customers must make trade-offs between consistency, availability, and partition tolerance when designing their systems on AWS.
  • Fault Domains: AWS has multiple fault domains, including regions, availability zones, and flows. Understanding these domains is crucial for building resilient networks.
  • AWS Hyperplane: A key internal service that supports many AWS networking services, providing fault tolerance and high availability within an AZ.
  • Load Balancers: Different types of AWS Load Balancers (ALB, NLB, Gateway LB) have specific resilience characteristics and best practices for deployment.
  • Multi-Region Architectures: For applications requiring multi-region deployment, AWS provides several traffic routing options, each with its own resilience features.
  • Hybrid Networks: Direct Connect and Site-to-Site VPN are critical for hybrid network resilience, with specific configurations recommended for different resilience levels.
  • Monitoring and Baseline: Continuous monitoring and establishing baselines for network performance are essential for troubleshooting and ensuring network resilience.