Title: AWS re:Inforce 2024 - Preserving privacy on data collaboration with AWS Clean Rooms (COM221)
Insights:
- Data Privacy Importance: Data privacy is increasingly critical due to frequent data breaches and regulations like GDPR. Customers are more concerned about their privacy.
- Data Collaboration Needs: Despite privacy concerns, data collaboration is essential for tasks like big data analysis and AI model training.
- Data Flow in Collaboration: Data moves from the data subject to the data custodian and then to a centralized collaborator for processing, increasing exposure risk at each step.
- Minimizing Data Exposure: To mitigate risks, it's crucial to minimize the amount of data shared at each stage of the collaboration process.
- Challenges with PII: Personal Identifiable Information (PII) like social security numbers and passport numbers are easy to identify and remove, but their removal can reduce data usability.
- Usability Trade-off: Removing PII can hinder collaboration, as seen in examples where companies need to match customer data without revealing PII.
- AWS Clean Rooms: AWS Clean Rooms allow data custodians to share data securely by setting analysis rules that restrict data capabilities, preserving privacy while enabling meaningful collaboration.
- Analysis Rules: Rules like aggregation and list analysis help protect individual privacy by limiting the type of data queries allowed.
- Cryptographic Computing: AWS Clean Rooms support cryptographic computing (C3R), allowing data to be encrypted before analysis, further protecting sensitive information.
- Output Privacy: Minimizing data in the output is not enough; attackers can use auxiliary data to re-identify individuals. Differential privacy helps manage this by adding noise to the data.
- Differential Privacy: This concept involves adding random noise to data queries to protect individual privacy and managing a privacy budget to limit data exposure over multiple queries.
- Practical Implementation: AWS Clean Rooms provide tools for setting privacy budgets, adding noise, and managing data collaboration securely, ensuring both privacy and data usability.
Quotes:
- "Data privacy is a very important topic. We have many data breaches. Customers are more and more concerned about their privacy with many regulations like GDPR."
- "When we do data collaboration, sometimes the data custodian will do the data processing themselves, but sometimes we may do collaboration of many different parties."
- "The risk of exposure will increase during along this line because the data is going more and more far away from the data subject."
- "We want to minimize the amount of data that goes. For example, we are the data subject when we do data users sign up we know we don’t need to give them the credit card number if there’s no reason."
- "If we remove the PII, we will decrease the usability of the data."
- "AWS Clean Rooms is a service that different data collaborators or the data custodian can share the data within one AWS Clean Room collaboration."
- "We can set the analysis rules to restrict the capabilities of different data. The data analysis can do to the data."
- "Differential privacy is about the amount of data that you expose from each query."
- "Every time when you query data, you are leaking some of the privacy from the data source."
- "Stripping PII blindly is not the solution because we want to also strike a balance on the data usability versus the privacy that we want to protect."