Title
AWS re:Invent 2022 - How Corning built E2E ML on a data lakehouse platform with Databricks (PRT321)
Summary
- Jess Cornell from Databricks introduced the session, highlighting Databricks' role as a leader in the Lakehouse paradigm and its partnership with AWS.
 - Databricks is known for creating Apache Spark, Delta Lake, and MLflow, and it aims to help data teams solve complex problems using data and AI.
 - The session focused on the convergence of BI and AI, the challenges of using separate platforms for data warehouses and data lakes, and how Databricks' Lakehouse platform addresses these issues.
 - Prem, a colleague of Jess, discussed the challenges in productionalizing machine learning and how Databricks simplifies the machine learning lifecycle on the Lakehouse platform.
 - Dennis Kamotsky from Corning Incorporated presented a case study on implementing machine learning for manufacturing on Databricks' Lakehouse platform.
 - Corning used Databricks to build a low-latency model for detecting whether parts were clean or dirty, which was deployed across all Corning Environmental Technologies plants worldwide.
 - The solution involved centralizing data in the cloud, training the model, registering it in MLflow, and deploying it on the edge.
 - The project led to a $2 million cost avoidance in the first year and received an award from the Manufacturing Leadership Council for AI and machine learning in the industry.
 
Insights
- The Lakehouse paradigm is gaining traction as it combines the benefits of data lakes and data warehouses, providing a unified platform for various data workloads.
 - Databricks' Lakehouse platform is multi-cloud and built on open formats and standards, which reduces the risk of vendor lock-in and technical debt.
 - Corning's use case demonstrates the practical application of Databricks' Lakehouse platform in a manufacturing context, showcasing the potential for AI to improve quality control and reduce costs.
 - The case study highlights the importance of cross-functional teams and the integration of data science, ML engineering, and application engineering to successfully deploy machine learning models in production.
 - The project's success is attributed to the ability to centralize data, leverage cloud resources for model training, and deploy models at the edge, which is a common requirement in manufacturing environments.
 - The approach taken by Corning can serve as a blueprint for other manufacturing companies looking to implement end-to-end machine learning solutions for quality control and process optimization.