Bringing Software Engineering Rigor to Data Com306

Title

AWS re:Invent 2022 - Bringing Software Engineering Rigor to Data Engineering (COM306)

Summary

  • Zainab Maleki, a technical lead at Mechanical Rock and an AWS Advanced Partner, shares her transition from software engineering to data engineering.
  • She emphasizes the importance of moving fast with confidence in data projects, drawing parallels between software and data engineering practices.
  • Maleki introduces DORA metrics (Deployment Frequency, Lead Time for Changes, Change Failure Rate, Time to Restore Service) as a measure of software delivery performance and suggests applying them to data engineering.
  • She discusses challenges in data engineering, such as slow-moving data platforms, slow platform adoption, overprotective data teams, and cluttered data pipelines.
  • Maleki advocates for a data mesh approach, decentralizing data teams, and domain-driven data design to improve data engineering practices.
  • She demonstrates how to use AWS Service Catalog for automated domain vending and governance in Snowflake.
  • Maleki concludes with actionable steps for data engineers, such as enforcing Git usage, setting up automated deployments, adding automated testing, separating platform administration from data pipelines, capturing data pipeline health metrics, and removing unnecessary admin access.

Insights

  • The transition from software engineering to data engineering can be facilitated by applying software engineering principles to data projects, ensuring both speed and reliability.
  • DORA metrics, typically used in software delivery, can be a valuable tool for assessing and improving the performance of data engineering teams.
  • Data mesh and domain-driven design are emerging as effective strategies for managing complex data ecosystems, promoting scalability and team autonomy.
  • Automation and templating, as shown in the AWS Service Catalog demo, can significantly streamline the onboarding process for new data domains and enforce governance.
  • Maleki's talk underscores the need for cultural and process changes within data teams, such as embracing Git, automated testing, and role separation, to achieve operational excellence.
  • The challenges and solutions presented reflect broader trends in data engineering, where organizations are moving towards more agile, decentralized, and automated data management practices.