Title
AWS re:Invent 2022 - Dive deep on AWS networking infrastructure (NET402)
Summary
- J.R. Rivers and Stephen Callaghan, senior principal network engineers at AWS, presented a deep dive into AWS's networking infrastructure.
- They discussed the Infrastructure Services Organization's role in building the physical components of AWS, including networking.
- The session covered the guiding principles (tenets) for AWS's network design, emphasizing security, availability, capacity, and performance.
- They highlighted the evolution of AWS's network from consuming off-the-shelf hardware to creating custom devices and innovating with proprietary technology.
- The talk included technical details on AWS's use of routers, switches, and cables, and how they manage over 1 million network devices.
- They discussed the importance of multi-tenancy, redundancy, and diverse supply chains in maintaining network reliability and performance.
- The speakers shared insights into AWS's custom hardware innovations, such as encryption modules, hardware BMCs, and optical modules.
- They explained AWS's hybrid control plane approach, combining distributed and centralized models for network management.
- The session also touched on AWS's active and passive monitoring systems, auto-remediation, and the future direction towards intent-based networking.
- The presenters invited attendees to visit the AWS Village to see some of the networking hardware and encouraged feedback through a survey.
Insights
- AWS places a high priority on security, going beyond standard practices by encrypting all links that leave their control and sanitizing equipment that leaves their red zones.
- The network's design focuses on redundancy and diverse power feeds to ensure high availability, even in the face of common points of failure like shared risk link groups (SRLGs).
- AWS's approach to network capacity involves developing their own hardware and software, using multiple suppliers, and ensuring a diverse supply chain to mitigate risks like those seen during the COVID-19 pandemic and semiconductor shortages.
- Performance is a foundational aspect of AWS's network, with a preference for building a high-performance network over relying solely on traffic engineering.
- AWS has evolved from using standard networking hardware to creating custom devices and innovating with proprietary technology, allowing them to meet specific needs and scale effectively.
- The network's design uses a folded fat tree architecture, which is scalable and limits fault domains, and AWS has moved towards single-chip platforms for simplicity and reliability.
- AWS's network control plane is a hybrid model that combines the stability of distributed protocols with the visibility and prescriptiveness of centralized models.
- Active monitoring systems and auto-remediation play a crucial role in maintaining network performance and reliability, with billions of observations per minute informing operational decisions.
- AWS is exploring intent-based networking to ensure that multi-domain systems operate cohesively and according to expected behaviors.