Health Data as Political Artifacts

Health data are not biased in a reductive and negative sense, but are political artifacts—marked by, reflecting, and shaping systems of power in society, writes Collective member Kadija Ferryman

Close up of microscope lenses.

Representative data alone do not mean that AI tools used in medical care will not exacerbate existing, or even create new, health disparities. Photo: Colourbox.

Not too long ago, I was at home, in front of my laptop, watching a conference online as many of us are doing these days. The topic was artificial intelligence in health, and the speaker was discussing the development of a machine learning tool to detect prostate cancer. During the question-and-answer portion of the webinar, I asked how health disparities in prostate cancer had been considered during the development of this tool, since that hadn't been mentioned during the talk. I knew that race mattered in prostate cancer, since I have several relatives dealing with this disease, and I know that racial disparities exist.

For example, Black men have higher incidence and mortality rates of prostate cancer. Research has also shown that Black men are less likely to receive treatment that follows prostate cancer care guidelines. They are also less likely to receive diagnostic MRIs and experience a longer time between diagnosis and the start of treatment than white men. And though there is some research on biological contributions to racial differences in prostate cancer, there is also evidence showing that Black and white men who receive similar care have similar outcomes.

The response I received was that the team used a representative dataset, meaning that racial and ethnic groups were proportionally represented in the data. Though representative datasets are a good starting place, representative data alone do not mean that AI tools used in medical care will not exacerbate existing, or even create new, health disparities. It is clear that un-representative datasets can lead to problematic, or biased AI tools. However, it is not enough for racial and ethnic groups to be proportionally present in health data, it also matters how they are represented in that data, and how this data is understood and interpreted.

In the Fairness in Precision Medicine research study, we examined how clinical guidelines and the care people receive shape how people are represented in clinical data. One of our interviewees explained that lung cancer screening guidelines made it harder for Black individuals in the US to access screening. Clinical data which shows that Black patients are more likely to present with advanced stages of lung cancer, is likely connected to this lack of early screening. The issue here then, is not one of data representativeness, but instead it brings to relief how health data represents patterns of racial discrimination. Thus, a dataset that could be representative of population groups is stamped with social histories.

Robot arm pointing on a wall
Teams designing AI tools for health need to include anthropologists, sociologists and historians who can provide essential information about data histories, contexts, and social processes. Photo: Pexels. 

As the development of AI tools in health continues, we must move away from the language of “bias,” as in biased datasets. Instead, we must realize that datasets are not biased in a reductive and negative sense, but that health data are political artifacts—meaning that they are marked by, reflect, and shape systems of power in society. As Kinjal Dave argues,

“When we stop overusing the word “bias,” we can begin to use language that has been designed to theorize at the level of structural oppression…By using the language of bias, we may end up overly focusing on the individual intents of technologists involved, rather than the structural power of the institutions they belong to.”

When we think of health data as political artifacts, we see that seemingly reasonable fixes like having representative datasets and/or removing race variables from analyses may not be enough. In addition, health data analysis projects must take data's history and context seriously. Data analysts can investigate the historical and current processes of marginalization that might be represented in the data. To do this, teams designing AI tools for health may also need to expand to include anthropologists, sociologists, historians, and others who can provide essential information about data histories, contexts, and social processes that animate the data.

Thinking of health data as political artifacts and expanding networks of expertise may also prompt critical reflections on how some of these same histories and current discriminatory and exclusionary practices factor into whose values, world-views and experiences, not just technical know-how, are shaping the way that analyses are done. Racialization, discrimination, and other processes of marginalization matter in health data, as in they are important, but they are also made, real, or material in these data. If we treat these data as the artifacts that they are, we open up new paths for data work that can be done in the service of health equity.


[1] Ferryman, K., & Pitcan, M. (2018). Fairness in Precision Medicine. Data and Society Research Institute.

By Kadija Ferryman
Published Feb. 7, 2022 10:30 AM - Last modified Feb. 7, 2022 3:12 PM
About-image

About this blog

A blog written by members of The Political Determinants of Health Collective, where they discuss how their work contributes to furthering knowledge and research in this area.