The Rise of the Data Elite: How AI Research is Re: inforcing Power Imbalances

The rise of AI-powered tools is transforming our everyday lives. We use the magic of ChatGPT and Midjourney and more mundane AI-powered credit profiling and email completion tools. However, the democratization of AI use is accompanied by global power disparities in AI research. A chart from the “Internet Health Report 2022” shows that the landscape of AI research papers is heavily skewed towards a few countries and elite institutions. The map reveals that more than half of the datasets used for AI performance benchmarking were from just 12 institutions and tech companies in the United States, Germany, and Hong Kong (China).

This map shows how often 1,933 datasets were used (43,140 times) for performance benchmarking across 26,535 different research papers from 2015 to 2020.
Reduced, Reused and Recycled: The Life of a Dataset in Machine Learning Research, Bernard Koch, Emily Denton, Alex Hanna, Jacob G. Foster, 2021.

This major imbalance in the discourse about how AI should be used and who should benefit from it reinforces existing power imbalances. A discussion piece from Data Pop Alliance called “The Return of East India Companies: AI, Africa and the New (Digital) Colonialism” explores various aspects of AI colonialism in Africa. For instance, there is under-development of natural language processing (NLP) technologies for non-Western languages. Computer vision of self-driving cars relies on low-paid human workers to label hundreds of hours of data. Lax ethical standards and “data dumping” in countries with less stringent data protection regulations effectively renders local people and society—AI guinea pigs. Despite the decreasing cost of training machine learning systems and greater availability of data, the power dynamics in AI research and development continue to reflect the dominance of a select few.

While machine learning models and datasets are being developed in other parts of the world, their use in research papers and performance benchmarking is still limited. We have the power to seek greater diversity and inclusivity in AI research, and to advocate for ethical standards that address data inequalities–as consumers and as researchers. For example, the UNDP and UNICEF regional Eurasia platform STEM4ALL to promote women and girls, share knowledge, raise awareness, and break gender stereotypes in STEM. Another way is by promoting collaboration across borders and develop own datasets to contribute to the global conversation.