Title
AWS re:Invent 2022 - Detect and resolve biases in artificial intelligence (COM304)
Summary
- Mathilde Virginie, manager of the modern data department, discusses biases in AI, focusing on detection and resolution.
- She highlights that nearly every AI model has biases, using Google Translate's gender bias as an example.
- Virginie emphasizes the importance of addressing biases, especially since they can lead to serious consequences, such as the Uber self-driving car incident.
- The session includes a demo using a dataset on heart disease to illustrate how to detect biases in datasets and models.
- She explains the CRISP-DM methodology and the importance of understanding business and data, preparing data, modeling, and evaluating in an iterative process.
- Various techniques and tools for detecting biases are demonstrated, including univariate and bivariate analysis, feature importance, LIME, and subpopulation analysis.
- Virginie concludes by stressing that biases come from datasets, not data scientists, and the importance of using multiple explainable AI algorithms to truly understand how models work.
Insights
- Biases in AI models are often unintentional and stem from the data used to train the models rather than the intentions of the data scientists.
- The iterative process of model development (CRISP-DM methodology) is crucial for identifying and mitigating biases at various stages, from business understanding to evaluation.
- Univariate and bivariate analyses are essential for understanding individual features and their interactions, which can reveal potential biases.
- Explainable AI (XAI) tools, such as feature importance and LIME, help in understanding the decision-making process of AI models and identifying biases.
- Subpopulation analysis is a powerful method to ensure that AI models perform equitably across different groups, thus detecting and addressing discriminatory biases.
- The talk emphasizes the need for human oversight in the AI development process to ensure fairness and safety, as tools alone cannot fully discern the context and implications of potential biases.