‘Reading Race’

Paul Taylor

Researchers led by a team from Emory University recently announced that they had used artificial intelligence to predict patients’ self-reported racial identity from medical images. It is an unexpected, unsettling result.

Neither expert radiologists nor the computer scientists who trained the algorithms can work out what it is in the images that the algorithms – they compared three different architectures of deep neural network – are using as the basis for the classification. The result is also astonishingly accurate and weirdly robust. Using a metric that ranges from 0.5 for totally random to 1.0 for absolutely perfect, the algorithms scored between 0.95 and 0.99 on the classification of subjects as Black, white or Asian when trained using chest X-rays, and between 0.80 and 0.96 using mammograms, CT scans and spinal X-rays.

Results as good as these are often a sign that something has gone wrong, that the method is flawed in some way. ‘Reading Race’, however, is an exceptionally careful and thorough piece of work. The experimenters used data from different hospitals, and carried out a wide range of experiments, such as training the algorithms on one dataset and testing them on a completely different one, to ensure that their results were valid. They looked to see if the algorithms were picking up diseases that Black people are more likely to have, or if age, sex, bone density or BMI was giving away the subjects’ racial identity. None of these had an impact strong enough to explain the effect.

The researchers tried removing parts of the images, blurring them and reducing the resolution. The worse the data that was fed into the algorithms, the worse the performance, but even with images that were so degraded as to be unrecognisable as X-rays some information about race was still being picked up. Almost unbelievably, on a set of chest X-rays reduced to a scale of four pixels by four the algorithm scores 0.63, only a little better than chance, but still better than chance.

I thought for some time about that ‘almost’. It stretches credulity that an algorithm could be able to pick up even a hint of a socially defined construct from just a few bytes of physical data. The paper is available on a preprint server in advance of peer review and other readers may well spot flaws that aren’t obvious to me. It is possible there is something unusual about the mix of patients at the university hospitals where much of the data comes from, or that patients whose race is recorded in a way that can be associated with their medical images are atypical and the result is in some way an artefact of the data.

The authors are clear that the result doesn’t mean there is some fundamental difference between races. They cite a 1986 paper summarising the reasons that race is not a biologically useful concept. Geographic variation in gene frequency is gradual and doesn’t fall into a natural set of categories. Individuals who share one trait will differ in another. There is no evidence for a package of genes that differentiates between races, and no biological reason to focus on the traits that are central to the assignment of race. Genetic differences within racial groups are orders of magnitude greater than those between racial groups.

Yet race is nevertheless an important variable in medicine. It is strongly predictive of poor outcomes across a range of diseases. This is not simply because in countries such as the US and the UK it correlates with lower socioeconomic status or less education: ‘Race is not confounded by these other variables, it is antecedent to them.’ The urgent medical question is how best to respond to the impact that race, as a social and political construct, has on health outcomes.

Crucial treatment decisions for patients recovering from Covid-19, for example, are made on the strength of measurements of lung function, assessed using devices known as spirometers. Spirometry software has a built-in adjustment for race, which assumes that the lung capacity of Black people is on average 10 to 15 per cent smaller, and that of Asians 4 to 6 per cent smaller, than that of white people. This means you could have two patients, one Black and one white, with the same lung function and the spirometry reading would indicate that only the white patient required treatment.

The idea that Black people have smaller lungs can be traced back to the racist ideology of the American South, but the values used in adjustments today are taken from a 1999 survey. There is no known genetic basis for the difference, and many doctors argue that it can be explained by socioeconomic factors, body proportions and occupational hazards, and that these are the factors we should be adjusting for.

The American Heart Association guidelines meanwhile categorise Black patients as at lower risk of death from heart failure, which may make them less likely to be allocated to more intensive forms of care. The guidelines give no rationale for this adjustment. And the algorithms used to estimate kidney function from measurements of creatine levels in the blood are routinely corrected for race because Black people, on average, have higher creatine levels, but the reasons for this are not understood and it is unclear if the adjustment is appropriate.

In these cases it could be argued that including race as a variable in calculation is at worst a relic of racist ideology and at best an inadequate proxy for variables we should be measuring directly. But the social significance of race is so great that, in other cases, ignoring it will exacerbate rather than remove inequalities.

Take for instance an algorithm used to assess candidates for a university programme. If it is blinded to the applicants’ race, the consequences of race on candidates’ scores for other variables, such as educational attainment, will still have an impact on the outcome. A fairer algorithm could be created by including race explicitly in a causal model of the relationships between these variables. This would be more complicated and less transparent, however, and it is easy to see why simply removing any explicit reference to race might appear the pragmatic solution. Yet the conclusion of ‘Reading Race’ is that, at least for machine learning algorithms applied to medical imaging, this just won’t work. If AI is as good as it seems to be at working out for itself who is Black and who isn’t, we can’t hope to overcome racial bias by blinding an algorithm to a patient’s race.


  • 27 August 2021 at 4:05pm
    Carl Jordan says:
    Could it be the case that artificial intelligence is as rubbish as it sounds and that algorithms are yet more Silicon Valley snake oil?

  • 27 August 2021 at 10:36pm
    Bob K says:
    "Spirometry software has a built-in adjustment for race, which assumes that the lung capacity of Black people is on average 10 to 15 per cent smaller, and that of Asians 4 to 6 per cent smaller, than that of white people."

    I'm struggling with the rationale for this. Surely the less lung capacity you have, the more important it is to deal with any reduction? Unless they are picking up on a trend that may continue in the future. It seems bizarre.

    • 29 August 2021 at 9:51am
      neddy says: @ Bob K
      My understanding of the point is that for a White person's lung capacity to be the same as a Black person's, the White individual's lung capacity will have fallen by 10% to 15% at a minimum. That is a significant reduction. But a measure of reduced capacity could be determined without comparing Black and Asian persons directly with White individuals. These types of measure strike me as being IQ tests by another route. So are whites superior now because of larger lung capacity? Is that the point? We all know that is how it will be shouted out by society's racists. Some studies should simply not be allowed.

  • 31 August 2021 at 5:40pm
    staberinde says:
    I'm not clear what the writer believes the problem to be.

    If black people tend to have lower lung capacity than white people's, that's surely useful for a doctor to know, one supposes, if they're deciding whether and when to put a Covid patient on ventilator support.

    The problem with a "race = culture not biology" argument is that diseases like sickle cell anaemia disagree.

    Do you really want a colour-blind healthcare system that simply ignores racial differences? Or do you want one that uses all the data available in order to make the argument: "This cohort has these needs, therefore resources should be allocated and policies developed accordingly." Who loses out in a one-size-fits-all healthcare model? Minorities.

    If the writer really wants to worry about AI, they might note how Facebook can tell with high accuracy whether someone on their platform is gay, by using data inference. Such technology in the hands of homophobic nation states would put lives in terrible danger.

    • 1 September 2021 at 1:32am
      neddy says: @ staberinde
      Do we want a Dulux colour chart health care system?

    • 1 September 2021 at 10:20am
      staberinde says: @ neddy

      I want resources and policies to follow clinical need, not ideology purity. In a white majority society, healthcare's default patient is white. A colour-blind healthcare system layered over that will ignore the clinical needs of non-white patients, causing them to be under-served.

      Alternatively, you might use all the data available to model the needs of different groups in order to resource more effectively (in the macro) and design tailored interventions for individuals (in the micro).

      I'll take a Dulux colour chart system for now, as a waypoint in the journey to a JPEG system with 16.8 million colours in it. You can keep your shitty 1950s one size fits all NHS.

    • 1 September 2021 at 11:27am
      neddy says: @ staberinde
      Separate But Equal. That seems to be the policy you are advocating: the very one rejected by the USA Supreme Court in 1954, regarding access to education, because it would deprive children from minority groups of equal educational opportunities (ie, equal to those available to White children). I suggest that colour coded health policies would similarly deprive persons from minority groups of equal healthcare, when compared to that available to persons of the dominant shade. The NHS may be a "shitty 1950s one size fits all system"; but the USA Supreme Court made its decision in the 1950s as well. Perhaps the 1950s weren't all bad.

    • 1 September 2021 at 6:49pm
      staberinde says: @ neddy

      So, Dr. Neddy, I take it your response to a black patient telling you they can't breathe would be: "Nonsense. Your lung capacity is the same as that of a white man of similar age, lifestyle, and health. Any data to the contrary is a slippery slope to fascism. That's why I'm going to ignore the AI's advice to put you on a ventilator now, and instead wait until the morning, which is when I'd usually put someone in your condition (but white) on a ventilator. The AI is sometimes racist and we have to watch out for this sort of thing. In the unlikely event you die tonight, you will have been a victim of structural racism rather than malpractice. I'm sure that after a long, distracting, demoralising, and expensive inquiry, a couple of our executive board members who you've never met will be forced to resign. Take two of these and be grateful for it, there's a good chap."

      I don't think your education example holds at all. If the data showed a higher incidence of, say, dyslexia, among a particular ethnic group, surely you'd want to target dyslexia support to those communities? I don't see how you leap from that to separate schools or hospitals.

      Marx argued "to each according to their needs." You seem to be arguing "to each the same", which is unlike any notion of progressivism I'm familiar with. It's the flip side of the poll tax.

    • 2 September 2021 at 1:07am
      neddy says: @ staberinde
      If a Black person presented with breathing problems, then I assume the health system would treat them as individuals, and seek a remedy for their particular condition. I would not assume, nor would I encourage, the system to match the person's skin tone to a color chart as an easy way of diagnosing the condition, and designing a treatment program.

Read more