Improving Machine Learning Fairness and Addressing Social Identity Bias in AI Models
December 13, 2024Machine-learning models often falter when tasked with predicting outcomes for individuals underrepresented in their training datasets. For example, a model trained predominantly on male patients to predict optimal treatments for chronic diseases may fail to make accurate predictions for female patients in clinical settings.
To tackle this issue, engineers traditionally balance datasets by removing data points to achieve equal representation among subgroups. However, this approach can result in the removal of substantial amounts of data, reducing the overall model performance.
MIT researchers have introduced a groundbreaking method to enhance fairness in machine learning. Instead of broadly balancing datasets, their technique identifies and removes specific data points that disproportionately contribute to the model’s errors for minority subgroups. This approach eliminates fewer data points than traditional methods, thereby preserving overall accuracy while significantly improving model performance for underrepresented groups.
Additionally, the technique is effective even for datasets lacking subgroup labels, which are common in many applications. By pinpointing problematic data points, the method reveals hidden biases in training datasets. The process can also be combined with other fairness-enhancing techniques, making it particularly valuable for high-stakes applications like healthcare.
For instance, this method could mitigate bias in AI systems to ensure underrepresented groups aren’t misdiagnosed due to skewed training data. “Our approach identifies specific data points driving bias, allowing us to remove them and achieve better performance,” explains Kimia Hamidieh, co-lead author and EECS graduate student at MIT.
The research team, comprising experts from MIT and Stanford University, developed this technique as an extension of a prior method called TRAK. Their work, presented at the Conference on Neural Information Processing Systems, demonstrates that targeted data curation can improve worst-group accuracy across multiple datasets without compromising overall performance.
The method is easy to use and adaptable across different machine-learning models, providing a practical solution for practitioners. Future studies aim to validate its performance further and explore applications in real-world environments. This work represents a significant step toward building more equitable and reliable AI systems.
Similarly, another study examined social identity biases in 77 large language models (LLMs). This research provides critical insights into how biases, such as ingroup solidarity and outgroup hostility, manifest in AI models and their interactions with users. Fine-tuning with human feedback reduces bias but does not eliminate it entirely.
Targeted data curation, such as removing biased sentences, was found effective in reducing these biases. However, ethical considerations arise regarding the balance between reducing bias and maintaining diverse perspectives in training data. The study underscores the need for robust alignment techniques, especially in conversational AI, to mitigate bias across dynamic, real-world contexts.
Future research should explore multi-language and multi-turn conversational settings to address limitations and further enhance AI fairness and reliability in diverse applications.