Best Practices for Operating on Aws Cop321

Title

AWS re:Invent 2023 - Best practices for operating on AWS (COP321)

Summary

  • Introduction: Steve Rice introduces the session and himself as the general manager for AWS AppConfig. Guilherme Greco and Thiago Moraes are also introduced.
  • Customer Journey: Steve shares a story about a customer's journey to AWS, starting with a single workload and growing organically into a complex multi-account structure.
  • Foundation on AWS: The talk covers laying the foundation on AWS, operating at scale, and insights from Banco Itaú on implementing best practices.
  • Operating Models: Guilherme Greco discusses different operating models (traditional operations, centralized DevOps, and distributed DevOps) and how they align with multi-account strategies.
  • AWS Account and Organization: An AWS account is compared to an apartment within a landing zone. AWS Organization helps manage multiple accounts centrally.
  • Best Practices: Greco outlines best practices, including defining the operating model, aligning it with a multi-account strategy, and centralizing governance with AWS Control Tower.
  • Demo: A demo is presented showing how to enable AWS services and enforce compliance using AWS Control Tower and proactive controls.
  • Operating at Scale: Steve discusses AWS's approach to operating at scale, including treating operational pain points like a feature backlog, automation, and resiliency through feature flags.
  • Systems Manager: Steve introduces AWS Systems Manager and its sub-features, emphasizing its role in streamlining operations.
  • Itaú's Implementation: Thiago Moraes shares Itaú's journey to the cloud, their organizational changes, multi-account strategy, and how they streamlined operations using AWS Systems Manager.
  • Conclusion: The session concludes with a reminder of the importance of designing operating models that can recover from failures and an invitation to visit the CloudOps kiosk.

Insights

  • Organic Growth Challenges: The story shared by Steve highlights a common challenge for AWS customers: organic growth leading to a complex and potentially chaotic multi-account structure. This underscores the importance of planning and governance in cloud adoption.
  • Importance of Operating Models: Greco's discussion on operating models illustrates the need for clear strategies that align with business and technical requirements, particularly when scaling operations in the cloud.
  • Centralized Governance: The emphasis on centralized governance with AWS Control Tower suggests that while operations may be decentralized, control and compliance need to be managed centrally to maintain order and security.
  • Automation and Resiliency: Steve's points on automation and resiliency, including the use of feature flags, highlight AWS's focus on reducing operational overhead and improving service availability.
  • Systems Manager as a Key Tool: The introduction of AWS Systems Manager as a comprehensive tool for managing AWS resources indicates its significance for customers looking to streamline their cloud operations.
  • Real-World Application: Thiago's account of Itaú's cloud journey provides a practical example of how AWS best practices can be applied in a large enterprise, demonstrating the scalability and flexibility of AWS services.
  • Continuous Improvement: The session's content reflects AWS's commitment to continuous improvement and innovation, as evidenced by the recent launch of visual automation in Systems Manager and the ongoing evolution of Itaú's cloud strategy.