Dive Deep on Aws Networking Infrastructure Net402

Title

AWS re:Invent 2022 - Dive deep on AWS networking infrastructure (NET402)

Summary

  • J.R. Rivers and Stephen Callaghan, senior principal network engineers at AWS, presented a deep dive into AWS's networking infrastructure.
  • They discussed the Infrastructure Services Organization's role in building the physical components of AWS, including networking.
  • The session covered the guiding principles (tenets) for AWS's network design, emphasizing security, availability, capacity, and performance.
  • They highlighted the evolution of AWS's network from consuming off-the-shelf hardware to creating custom devices and innovating with proprietary technology.
  • The talk included technical details on AWS's use of routers, switches, and cables, and how they manage over 1 million network devices.
  • They discussed the importance of multi-tenancy, redundancy, and diverse supply chains in maintaining network reliability and performance.
  • The speakers shared insights into AWS's custom hardware innovations, such as encryption modules, hardware BMCs, and optical modules.
  • They explained AWS's hybrid control plane approach, combining distributed and centralized models for network management.
  • The session also touched on AWS's active and passive monitoring systems, auto-remediation, and the future direction towards intent-based networking.
  • The presenters invited attendees to visit the AWS Village to see some of the networking hardware and encouraged feedback through a survey.

Insights

  • AWS places a high priority on security, going beyond standard practices by encrypting all links that leave their control and sanitizing equipment that leaves their red zones.
  • The network's design focuses on redundancy and diverse power feeds to ensure high availability, even in the face of common points of failure like shared risk link groups (SRLGs).
  • AWS's approach to network capacity involves developing their own hardware and software, using multiple suppliers, and ensuring a diverse supply chain to mitigate risks like those seen during the COVID-19 pandemic and semiconductor shortages.
  • Performance is a foundational aspect of AWS's network, with a preference for building a high-performance network over relying solely on traffic engineering.
  • AWS has evolved from using standard networking hardware to creating custom devices and innovating with proprietary technology, allowing them to meet specific needs and scale effectively.
  • The network's design uses a folded fat tree architecture, which is scalable and limits fault domains, and AWS has moved towards single-chip platforms for simplicity and reliability.
  • AWS's network control plane is a hybrid model that combines the stability of distributed protocols with the visibility and prescriptiveness of centralized models.
  • Active monitoring systems and auto-remediation play a crucial role in maintaining network performance and reliability, with billions of observations per minute informing operational decisions.
  • AWS is exploring intent-based networking to ensure that multi-domain systems operate cohesively and according to expected behaviors.