Improve Resilience of Sap Workloads with Aws Support Sup312

Title

AWS re:Invent 2023 - Improve resilience of SAP workloads with AWS Support (SUP312)

Summary

  • 3M collaborated with AWS support to enhance the resilience of their SAP workload.
  • Kim Otto from 3M and AWS technical account managers Manik Chopra and Vijay Sitaram shared their experiences.
  • AWS support provided various engagements like tabletop exercises, fault testing, and runbook reviews to improve disaster recovery (DR) exercises.
  • 3M utilized AWS Resiliency Hub, Fault Injection Simulation, and CloudWatch Application Insights to modernize their SAP resiliency management.
  • The session emphasized the importance of high availability, continuity of operations, and continuous resilience.
  • Recovery Point Objective (RPO) and Recovery Time Objective (RTO) are critical metrics for system resiliency.
  • Different categories of failures were discussed, including code deployment, core infrastructure, data corruption, dependencies, and regional outages.
  • AWS services like AWS Health, Trusted Advisor, and CloudWatch were leveraged by 3M.
  • The well-architected SAP with a lens, access to response and incident, business build and review runbooks, manage and operate resilience, test and validate recovery, and monitor and observe availability were key support engagements.
  • 3M's SAP architecture includes two regions for production and non-production workloads, with a focus on identifying single points of failure.
  • Trusted Advisor Priority and Resilience Hub were used to assess and improve the resilience of 3M's SAP systems.
  • Fault Injection Service (FIS) was used to simulate and test various failure scenarios.
  • CloudWatch Application Insights for SAP provided monitoring capabilities for both infrastructure and application metrics.
  • Kim Otto highlighted the journey with AWS, focusing on reducing Mean Time to Recovery (MTTR), modernizing operations, and fostering a culture of resilience testing.

Insights

  • The collaboration between 3M and AWS demonstrates the value of AWS support in enhancing the resilience of critical workloads.
  • AWS Resiliency Hub and Fault Injection Simulation are powerful tools for assessing and testing the resilience of systems, allowing for proactive identification and mitigation of potential failure points.
  • The use of AWS services for monitoring and observability, such as CloudWatch Application Insights, is crucial for maintaining system health and quickly identifying issues.
  • The integration of Trusted Advisor and Resilience Hub provides a comprehensive view of system resilience and actionable recommendations for improvement.
  • The session highlighted the importance of continuous resilience practices, including regular DR testing and updating runbooks for cloud environments.
  • The focus on RPO and RTO metrics underscores the need for businesses to align their technical resilience strategies with their business continuity requirements.
  • The journey of 3M serves as a case study for other organizations looking to improve their system resilience on AWS, particularly for complex and mission-critical applications like SAP.
  • The emphasis on cross-team communication and collaboration is key to successful resilience management, as it involves both infrastructure and application teams.
  • The session provided insights into the evolving nature of cloud services and the importance of staying current with new features and best practices to maintain a resilient infrastructure.