From Artificial Intelligence Bias to Inequality in the Time of COVID-19

iStock/cooperr007

As secretary general of the United Nations, Antonio Guterres said during the 2020 Nelson Mandela Annual Lecture, “COVID-19 has been likened to an X-ray, revealing fractures in the fragile skeleton of the societies we have built.” Without a doubt, the COVID-19 pandemic has exposed and exacerbated existing global inequalities. Whether at the local, national, or international scale, the gap between the privileged and the vulnerable is growing wider, resulting in a broad increase in inequality across all dimensions of society. The disease has strained health systems, social support programs, and the economy as a whole, drawing an ever-widening distinction between those with access to treatment, services, and job opportunities and those without. Global lockdown restrictions have led to increases in childcare and housework responsibilities, and most of the burden has fallen on women, further increasing existing gender inequality [1], [2]. Indigenous populations worldwide find themselves more vulnerable to infection, many times with less access to health services or hygiene measures and limited updated scientific information about the virus and measures that can be taken to mitigate it [3]. Inequality has also pervaded the education sector, with only a subset of students able to attend safe in-person schooling or access online education when needed.

The COVID-19 pandemic has exposed and exacerbated existing global inequalities.

The pandemic has also increased the differences between countries, distinguishing between those which are able to access tests and diagnostic tools, personal protective equipment (PPE), medical equipment such as ventilators, and (eventually) vaccines, and those that cannot. International cooperation is critical, and as new diagnostics, medicines, and vaccines come through the development pipeline, there must be collaboration and mechanisms for joint procurement and pooling of risk. Support for initiatives such as the COVID-19 Vaccine Global Access Facility (COVAX) will help avoid myopic nationalist vaccine strategies that will result in increasing inequality and worsen outcomes for humanity as a whole. When planning public health and economic interventions, governments need to take existing inequalities into account, lest they end up further worsening the situation [4]. To help address these issues, the United Nations Development Programme (UNDP) has proposed a comprehensive framework for providing socioeconomic support to countries and societies in the face of COVID-19, which covers five complementary dimensions, including: 1) essential health services; 2) social protection; 3) microeconomic response and recovery programs; 4) macroeconomic fiscal and financial stimuli and policies; and 5) community-led resilience and response systems, while ensuring social equality and inclusion [5]. UNDP stress that all of these dimensions must be taken into account when designing and deploying interventions to ensure that all necessary aspects of human and societal well-being are addressed.

AI, COVID-19, and Inequality

In a recent article reviewing AI solutions against COVID-19 [6], we identified three main scales of AI applications: the molecular scale, the clinical scale, and the societal scale. We also highlighted the importance of assessing the maturity and feasibility of proposed interventions at each of these scales before deploying AI solutions. At the molecular level, much of the current research involves assisting drug discovery and development, and improving molecular diagnosis. Many of these applications remain at the research stage, given the challenges in synthesizing and testing compounds, as well as limitations in data and model sharing. Nonetheless, at least four candidate vaccines that reported the use of machine learning (ML) in their development have advanced to the clinical evaluation stage according to the World Health Organization (WHO) [7]. At a clinical level, AI has mostly been used to assist and improve patient-level assessment of COVID-19 via analysis of medical imagery and patient records. However, the extent to which the analysis from these techniques alone can be used for the diagnosis of COVID-19 is still debated by the medical community [8], and transparency and explainability of the proposed diagnoses remain overlooked by most AI approaches. Finally, on a societal scale, two crucial and complementary lines of research focus on modeling the spread of the pandemic across territories and regions, as well as its accompanying “infodemic” of misinformation. Despite reservations regarding whether models trained on data from one context are applicable in another one, data-driven models remain paramount in analyzing the spread of the virus. Overall, while AI can be an important tool in fighting the pandemic, context-sensitive deployment is key. Working with stakeholders who have necessary domain knowledge, in addition to developing appropriate model and data sharing solutions, is critical [9], [10].

At least four candidate vaccines that reported the use of machine learning in their development have advanced to the clinical evaluation stage according to the World Health Organization.

Will the AI systems applied in the fight against COVID-19 increase or decrease inequality? While the AI community is still in the early stages of application development, we have identified some persistent sources of bias which run the risk of exacerbating inequality [9]. We believe, however, that some of these biases can be mitigated (and potentially overcome) if proper assessment is included during the application development process. Generally, AI applications against COVID-19 have been developed in and for countries in the Global North, resulting in a lack of information about how the resulting tools will impact the evolution of the disease and related policies in the Global South [11]. Even within highly developed countries, many modeling and data collection efforts overlook or neglect underrepresented minorities [12]–[13][14].

Will the AI systems applied in the fight against COVID-19 increase or decrease inequality?

AI systems can be inherently prone to bias which can be introduced at several different points in the application development pipeline. First, bias can be present during problem scoping, that is, by influencing the way in which the problem to be addressed by AI is framed, as well as the extent of the work itself. Second, bias can be present in the data used to train an AI system, or the way in which the data was labeled. Especially in healthcare situations, using AI approaches can reproduce and amplify existing biases in medical data sets, oftentimes not accounting for data from minority groups [15], [16]. Third, bias can be present in the choice of the algorithms themselves and how its intricate configuration parameters are tuned. Finally, bias may influence how results are evaluated and interpreted, which in turn affects how AI outputs are used. A further challenge arises when models trained and tested in one setting are then applied to another, and results are interpreted without acknowledging any necessary corrections specific to the new context. While this is not an exhaustive list of all potential sources of bias, these steps of an AI pipeline are all subject to both conscious and unconscious bias, which can reinforce existing inequalities as well as promote new ones if not properly addressed.

Examples

In the sections below, we illustrate examples of applications in which biases result (or might result) in increased inequality.

Diagnosis

There are many algorithmic approaches for supporting COVID-19 diagnosis from computer tomography (CT) and X-ray scans which frame the diagnostic problem as a classification task (i.e., identifying healthy vs. COVID-19-positive individuals), to training neural networks to detect masses and patterns in lung scans [17]–[18][19]. This can be particularly problematic when attempting to make meaningful conclusions based solely on medical imagery and, in particular, in geographies with a high prevalence of other diseases not included in the training data sets which might also affect the lungs (such as tuberculosis or HIV/AIDS [20]–[21][22]), which can be confused with COVID-19 and lead to its misdiagnosis. In addition, much of the existing medical imaging research relies on small and poorly balanced data sets that mix data from several populations without proper traceability. Prior research has shown that when training data sets are imbalanced on gender, the performance of deep learning models in radiology decreases, in particular, in the case of X-ray image data sets used to diagnose thoracic diseases [23]. Such biases might also reduce the performance of AI applications on CT and X-ray imaging of COVID-19 if not properly taken into account.

Another structural source of bias is that not all regions can afford scanner equipment. As a result, the data used to train the related AI models will not be representative and the approaches proposed may not be systematically deployed in disadvantaged regions. On the other end of the technology requirement spectrum, there are some proposals involving mobile-based diagnosis approaches [24] that are potentially promising, but warrant further exploration and validation.

Much of the existing work that utilizes AI to analyze medical imagery does not provide transparency or interpretability, delivering a categorical verdict based solely on an incoming image. This “black-box” approach may be acceptable for human-in-the-loop deployment where potential COVID-19-positive images are flagged for expert radiologists, who will then carry out further analysis manually. However, such approaches can be problematic in contexts where medical experts are lacking or do not have enough time. Fully automated AI pipelines must be assessed in the context of their clinical impact before being deployed, which includes carefully considering and mitigating the risks in addition to assessing biases which might result into incorrect and unfair functioning.

Utilizing complementary medical data such as information regarding a patient’s gender, age, and comorbidities, as well as clinical indicators, does not only improve the accuracy of image-only approaches, but also produces results that are more interpretable for clinicians. For instance, a hybrid approach that merges algorithmic analysis of both CT scans as well as clinical features to predict the severity of COVID-19 [25] reported high accuracy rates, and the set of clinical features identified as relevant by the algorithm were coherent with those identified by previous studies. This overlap is promising for eventual clinical monitoring of COVID-19 severity, both manually and using Al-infused approaches. However, these kinds of hybrid studies should also be replicated in other regions and complemented with clinical data from incoming cases around the world, achieving better global coverage and reproducibility.

Treatment

Research has consistently shown higher rates of infection, hospitalization, and death in ethnic minorities during the COVID-19 pandemic [12], [13], [26], [27]. Nonetheless, our understanding of these inequalities remains poor, which hinders the development of solutions to combat COVID-19 and the disparities it magnifies. Gathering sufficient and accurate data on all of the social determinants of health, including race and ethnicity, is critical for effective research and development of both medical and public health interventions. However, inadequate and biased data collection is prevalent in practice. For example, data collection mechanisms are often poorly designed, with inconsistent ethnicity and race labeling [28], [29]. In fact, a systematic review has found that of 1518 COVID-19-related clinical trials registered on ClinicalTrials.gov, only one randomized controlled trial and five observational studies collected data on ethnicity [30]. Overall, fragmented and incomplete data makes it challenging for AI to succeed in furthering our understanding of COVID-19 and devising appropriate interventions.

The inequitable selection of participants as well as inconsistent presentation of demographic data has been a prevalent issue in pharmaceutical trials long before COVID-19 [31]. Prior research has shown that demographic differences often translate into differences in physical outcomes for individuals and communities. For example, immune profiles differ from person to person as a result of genetic, evolutionary, and environmental factors such as age, ethnicity, comorbidities, geographic location, nutrition, and so on [32], [33]. Hence, these dimensions should be properly accounted for when designing ML-based methodologies. We must be particularly cautious to ensure that data limitations do not lead us to develop biased algorithms that use readily available and seemingly effective but in fact, problematic proxies, such as using healthcare costs as a proxy to assess health needs, which was found to be racially biased [34]. These kinds of AI applications risk not only making inaccurate predictions, but also reinforcing systemic injustices.

Although the private sector is playing a critical role in devising interventions against COVID-19, its lack of transparency may reinforce biases and inequalities. Most of the vaccine candidates that utilized ML in their development came from corporations that made little information on their ML approach available, making it difficult for independent researchers to inspect the biases that might come from the collection and measurement of data as well as the evaluation, aggregation, and deployment processes [35]. Research conducted in private sector largely relies on data such as knowledge graphs which are not publicly accessible, and proprietary algorithms hinder the discovery, evaluation, and rectification of algorithmic biases [34].

All in all, given that AI is playing an increasingly important role in expediting the discovery and development of medical solutions, it is particularly important to ensure that the data gathered is representative of global populations and that appropriate mechanisms to audit algorithms and dissect any possible biases are in place. Otherwise, solutions developed and tested in ways that are subject to systemic, clinical, and algorithmic biases may have unintended consequences in vulnerable populations.

Epidemics

The modeling of epidemics is relevant for understanding potential infection trajectories and to inform operational planning. Epidemiological models need careful fine-tuning to regional variables to capture age, socioeconomic status, and cultural norms. One of the most frequently used approaches is susceptible-infected-recovered (SIR) modeling, in which models are usually designed based on regional statistics capturing age, sex, and potentially other variables if data are available. However, these approaches often average over demographic and societal structures, meaning that differences in behavior between different groups are missed. Furthermore, developing epidemiological models in certain settings, and applying them to others with different cultural norms, can result in incorrect predictions. In fact, we have already seen examples of models failing to capture and explain differences in COVID-19 transmission trends between European and African contexts [36], [37]. If models are not developed in collaboration with regional experts, local populations are at a severe risk of being left behind.

Disparities in modeling efforts have also grown out of differences in data availability. For example, agent-based approaches to modeling disease spread, while able to better account for heterogeneous transmission dynamics, often require detailed data inputs. These can be challenging to acquire in under-served and ill-documented settings such as informal settlements. Similarly, many modeling approaches are designed to assess the potential for certain policy interventions, yet many of these interventions, such as social distancing or household quarantine, are impossible in highly crowded spaces where individuals need to leave their homes for water or aid collection.

In addition, many data sets which have served as important indicators for possible disease spread and recovery, as well as tools to aid in SARS-CoV-2 transmission mitigation efforts, are reliant on the access to certain technology. For example, Google mobility data [38] has provided valuable insights into the movement patterns of individuals and has been used in several modeling efforts [39], [40]. However, this fails to capture portions of society which do not have access to the mobile technology used in creating such data sets. Newly developed tools to fight the pandemic can also be reliant on technology which is unavailable to certain members of society such as contact tracing applications which may have certain smartphone and operating system requirements.

We have recently seen a rise in the use of ML for modeling the epidemiological trends of COVID-19 [41], [42], which might be heavily influenced by data limitations and biases [6], [43]. Indeed, while many classical modeling approaches can be interrogated mathematically and computationally, we are still much less able to interpret the reasoning behind the outputs of an AI system. Therefore, while AI models may be powerful as prediction tools, we need the ability to better explore models which have the potential to significantly impact people’s lives.

Infodemics

The propagation of mis- and disinformation around COVID-19 may act as yet another driver of inequality, and online social media platforms represent a particularly rich environment for the spread of infodemics. Research has shown that information is disseminated on these platforms in a viral manner, reproducing at a rate similar to that of a pandemic (i.e., exponentially) [44]. Although the rate at which true versus false claims are amplified appears to vary across platforms [44], research on Twitter has found that social media users may be more likely to share false information because it is novel [45]. The reach of misinformation may be further extended by the automated recommendation algorithms underlying many online social media platforms, which seek to identify and promote popular or viral posts to maximize engagement and garner attention [46]. Despite the widespread proliferation of misinformation, only a fraction of this content is flagged and ultimately removed or corrected [47].

Some of the most vulnerable populations in our society—such as the elderly, minorities, and other populations with low health literacy—may also be the most susceptible to the infodemic [48]. On the one hand, since much of the false information about COVID-19 involves fake cures, rumors about invincibility, and/or ineffective preventive measures, misinformation could lead these already-vulnerable populations to engage in unnecessarily risky behavior. On the other hand, these populations may be more likely to believe misinformation due to a tendency toward “inequality-driven mistrust” in which a historical legacy of discrimination and mistreatment at the hands of society and the medical community makes them more likely to question official information and believe in conspiracy theories [49]. In sub-Saharan Africa, researchers have argued that this skepticism extends to “distrust of philanthropic institutions, distrust of developed nations, and even distrust of leaders in their own respective countries” [50].

Systematic fact-checking efforts to proactively identify and correct false information may fail to reach these vulnerable populations because these populations may engage with misinformation through private or interpersonal channels [51], and in local languages or dialects [52]. This makes it difficult for fact checkers to identify the relevant pieces of false information and to reach these populations with corrected information. Many efforts to combat the infodemic so far have focused on data sources that are easy to mine with AI (e.g., Facebook, Twitter, and online news media), rather than on offline channels such as radio or TV which may act as primary sources of information in the developing world [53]. These efforts also tend to focus on widely spoken languages such as English, Spanish, Arabic, French, and Portuguese, rather than on under-represented regional languages for which natural language processing tools may not yet be available.

Another repercussion of the infodemic has been the targeting and stigmatization of vulnerable groups. For example, there has been a reported rise in hate crimes [54] and online hate speech [55] against Chinese and other Asian minorities. There has also been a rise in xenophobic and antimigrant sentiment more generally [56], both because there is a fear that arriving migrants will transmit the virus, and because of the perceived strain that these migrants will place on already-overburdened.

Discussion

AI applications have the potential to positively contribute to the fight against COVID-19. However, AI can also amplify the biases and inequalities that have emerged or become more extreme during the pandemic. As in any other application of AI, bias can be introduced at multiple points throughout the modeling pipeline. As COVID-19 furthers inequality along every dimension of society, it is important to identify and address AI applications which might contribute to this trend.

AI researchers and practitioners should pay special attention to bias introduced in the problem definition and data collection stages. Public health algorithms play important roles in decision-making, from shaping critical care protocols to deciding how to distribute vaccines. Women and minorities are also often not properly represented in data sets whose use may result in medical treatments and services. In addition, nonpharmaceutical interventions are increasingly based on measures constructed from digital traces (e.g., mobility patterns recorded from cell phone usage), although these do not account for those individuals without access to mobile networks or the internet. Finally, the spread of mis- and disinformation can be amplified by AI, which impacts more those with less health literacy.

In all of these examples, AI-driven biases and their implications for inequality should be assessed from the initial project development stages through the presentation and use of outputs. Furthermore, AI systems should be adapted to local contexts, following and accounting for cultural and social norms. It is also important to recognize that in some cases, the proposed solutions might be fair and efficient with respect to a particular problem statement, but may increase inequality along other dimensions. Contrary to other AI applications where the goal is to identify a single model to beat a single metric and benchmark, in situations where vulnerable populations are at risk, a more sensitive approach should be taken with a focus on outliers as metrics to assess failure as well as success. This analysis may require the problem to be approached from different angles. For example, outliers in a patient group might represent diversity and, rather than focusing on average performance, the more relevant task for modelers might be to understand for whom the model is performing worst.

Some key concepts to help reduce bias in AI applications include: the transferability of models and their adaptation to local contexts; federated learning strategies which may make it possible to include patient data from multiple cohorts in a privacy-preserving way; and the deployment of interpretable models which allow users to interrogate the patterns and reasons for model decisions, which is particularly relevant in high-risk settings. We hope this article will stimulate some reflections in the AI community aiming to support the COVID-19 response and will encourage researchers to assess whether biases in their AI applications will amplify inequality from a health, economic, and social perspective.

ACKNOWLEDGMENTS

The United Nations Global Pulse is supported in part by the Governments of Sweden and Germany and in part by the William and Flora Hewlett Foundation. The work of Joseph Bullock was supported by the United Kingdom Research and Innovation – Science and Technology Facilities Council (UKRI-STFC) under Grant ST/P006744/1. The work of Alexandra Luccioni was supported by funding from IVADO and Mila.

Author Information

Miguel Luengo-Oroz is currently the Chief Data Scientist with the United Nations Global Pulse, New York, NY, USA, and the Artificial Intelligence (AI) and Big Data Innovation Initiative of the United Nations (UN) Secretary-General. He is also with the Universidad Politécnica de Madrid, Madrid. Over the last decade, as the first data scientist at the UN, his research has focused on bringing AI to operations and policy in domains, including poverty, food security, epidemics, refugees, conflict, human rights, gender, infodemics, and climate. He has participated in multiple advisory bodies as the expert group on AI strategy for the government of Spain. He is the Founder of the global health social enterprise SpotLab, Madrid, Spain, and the Inventor of the games-for-health platform MalariaSpot.org. Dr. Luengo-Oroz is an Ashoka Fellow. He received the MIT TR35 Award and the European Responsible Research and Innovation Award.

Joseph Bullock is currently pursuing the Ph.D. degree with the Department of Physics, Institute for Data Science, Durham University, Durham, U.K.He is also a Researcher with United Nations Global Pulse, New York, NY, USA. He also holds a position of Research Associate at the RiskEcon Lab, Courant Institute for Mathematical Sciences, New York University, New York. His research focuses on developing artificial intelligence and numerical modeling techniques in the context of humanitarian and physics-based scenarios, with specific applications in particle physics, automated satellite image analysis, and disease modeling. In addition, he works on designing theoretical frameworks for quantifying and communicating uncertainties in machine learning systems and understanding why certain models fail.

Katherine Hoffmann Pham received the B.A. degree in international relations and economics and the M.A. degree in international policy studies from Stanford University, Stanford, CA, USA, both in 2011. She is currently pursuing the Ph.D. degree in information systems with the NYU Stern School of Business, New York, NY, USA. She spent four years working at the Innovations for Poverty Action, Washington, DC, USA. She is currently a Researcher with the United Nations Global Pulse, New York. Her research examines how big data, machine learning, and technology can be used to address policy problems, with a focus on the digital future of urban mobility and international migrant and refugee movements.

Cynthia Sin Nga Lam is currently an Infodemic Analyst at the United Nations Global Pulse, New York, NY, USA, supporting the COVID-19 response, a Contract Consultant at World Health Organization, Geneva, Switzerland, advising on digital health communications, and an agile-certified (PMI-ACP) practitioner managing various innovation projects. She is passionate about her variety of work connecting the dots and strives to innovate through making connections across different disciplines. Her research interests include the application of artificial intelligence in various areas of health (public health, health education, and consumer health) and health literacy.

Alexandra Luccioni received the Ph.D. degree in cognitive computing from the Université du Québec à Montréal, Montréal, QC, Canada, in 2018.She is currently a Researcher working on Artificial Intelligence for Humanity Initiatives at the Mila Institute, Montréal, QC, Canada, where she leads projects at the nexus of artificial intelligence (AI) and social issues, such as climate change, education, and health. She spent two years working in applied research, applying deep learning in industries such as customer service and banking. Since joining Mila in 2019, she has organized and led many AI for social good initiatives, conferences, and workshops. She is also highly involved in her community, volunteering for initiatives such as Women in Machine Learning, Climate Change AI, and Kids Code Jeunesse.

Content Clusters

Technical Activities Committees