How Corning Built E2e Ml on a Data Lakehouse Platform with Databricks Prt321

Title

AWS re:Invent 2022 - How Corning built E2E ML on a data lakehouse platform with Databricks (PRT321)

Summary

  • Jess Cornell from Databricks introduced the session, highlighting Databricks' role as a leader in the Lakehouse paradigm and its partnership with AWS.
  • Databricks is known for creating Apache Spark, Delta Lake, and MLflow, and it aims to help data teams solve complex problems using data and AI.
  • The session focused on the convergence of BI and AI, the challenges of using separate platforms for data warehouses and data lakes, and how Databricks' Lakehouse platform addresses these issues.
  • Prem, a colleague of Jess, discussed the challenges in productionalizing machine learning and how Databricks simplifies the machine learning lifecycle on the Lakehouse platform.
  • Dennis Kamotsky from Corning Incorporated presented a case study on implementing machine learning for manufacturing on Databricks' Lakehouse platform.
  • Corning used Databricks to build a low-latency model for detecting whether parts were clean or dirty, which was deployed across all Corning Environmental Technologies plants worldwide.
  • The solution involved centralizing data in the cloud, training the model, registering it in MLflow, and deploying it on the edge.
  • The project led to a $2 million cost avoidance in the first year and received an award from the Manufacturing Leadership Council for AI and machine learning in the industry.

Insights

  • The Lakehouse paradigm is gaining traction as it combines the benefits of data lakes and data warehouses, providing a unified platform for various data workloads.
  • Databricks' Lakehouse platform is multi-cloud and built on open formats and standards, which reduces the risk of vendor lock-in and technical debt.
  • Corning's use case demonstrates the practical application of Databricks' Lakehouse platform in a manufacturing context, showcasing the potential for AI to improve quality control and reduce costs.
  • The case study highlights the importance of cross-functional teams and the integration of data science, ML engineering, and application engineering to successfully deploy machine learning models in production.
  • The project's success is attributed to the ability to centralize data, leverage cloud resources for model training, and deploy models at the edge, which is a common requirement in manufacturing environments.
  • The approach taken by Corning can serve as a blueprint for other manufacturing companies looking to implement end-to-end machine learning solutions for quality control and process optimization.