Title
AWS re:Invent 2022 - How Corning built E2E ML on a data lakehouse platform with Databricks (PRT321)
Summary
- Jess Cornell from Databricks introduced the session, highlighting Databricks' role as a leader in the Lakehouse paradigm and its partnership with AWS.
- Databricks is known for creating Apache Spark, Delta Lake, and MLflow, and it aims to help data teams solve complex problems using data and AI.
- The session focused on the convergence of BI and AI, the challenges of using separate platforms for data warehouses and data lakes, and how Databricks' Lakehouse platform addresses these issues.
- Prem, a colleague of Jess, discussed the challenges in productionalizing machine learning and how Databricks simplifies the machine learning lifecycle on the Lakehouse platform.
- Dennis Kamotsky from Corning Incorporated presented a case study on implementing machine learning for manufacturing on Databricks' Lakehouse platform.
- Corning used Databricks to build a low-latency model for detecting whether parts were clean or dirty, which was deployed across all Corning Environmental Technologies plants worldwide.
- The solution involved centralizing data in the cloud, training the model, registering it in MLflow, and deploying it on the edge.
- The project led to a $2 million cost avoidance in the first year and received an award from the Manufacturing Leadership Council for AI and machine learning in the industry.
Insights
- The Lakehouse paradigm is gaining traction as it combines the benefits of data lakes and data warehouses, providing a unified platform for various data workloads.
- Databricks' Lakehouse platform is multi-cloud and built on open formats and standards, which reduces the risk of vendor lock-in and technical debt.
- Corning's use case demonstrates the practical application of Databricks' Lakehouse platform in a manufacturing context, showcasing the potential for AI to improve quality control and reduce costs.
- The case study highlights the importance of cross-functional teams and the integration of data science, ML engineering, and application engineering to successfully deploy machine learning models in production.
- The project's success is attributed to the ability to centralize data, leverage cloud resources for model training, and deploy models at the edge, which is a common requirement in manufacturing environments.
- The approach taken by Corning can serve as a blueprint for other manufacturing companies looking to implement end-to-end machine learning solutions for quality control and process optimization.