A Single Typo in Your Medical Records Can Make Your AI Doctor Go Dangerously Haywire

Unlock the Secrets of Ethical Hacking!

Ready to dive into the world of offensive security? This course gives you the Black Hat hacker’s perspective, teaching you attack techniques to defend against malicious activity. Learn to hack Android and Windows systems, create undetectable malware and ransomware, and even master spoofing techniques. Start your first hack in just one hour!

Enroll now and gain industry-standard knowledge: Enroll Now!

A single typo, formatting error, or slang word makes an AI more likely to tell a patient they’re not sick or don’t need to seek medical care.

That’s what MIT researchers found in a June study currently awaiting peer review, which we covered previously. Even the presence of colorful or emotional language, they discovered, was enough to throw off the AI’s medical advice.

Now, in a new interview with the Boston Globe, study coauthor Marzyeh Ghassemi is warning about the serious harm this could cause if doctors come to widely rely on the AI tech.

“I love developing AI systems,” Ghassemi, a professor of electrical engineering and computer science at MIT, told the newspaper. “But it’s clear to me that naïve deployments of these systems, that do not recognize the baggage that human data comes with, will lead to harm.”

AI could end up causing discrimination against patients who can’t communicate clearly in English, native speakers with imperfect command of the language, or anyone that may commit the human mistake of speaking about their health problems emotionally. Doctors using an AI tool could feed it their patients’ complaints that were sent over an email, for example, raising the risk that the AI would give them bad advice if those communications weren’t flawlessly composed.

In the study, the researchers pooled together patients’ complaints taken from real medical records and health inquiries made by users on Reddit. They then went in and dirtied up the documents — without actually changing the substance of what was being said — with typos, extra spaces between words, and non-standard grammar, like typing in all lower case. But they also added in the kind of unsure language you’d expect a patient to use, like “kind of” and “possibly.” They also introduced colorful turns of phrase, like “I thought I was going to die.”

From there, they fed these cases to four different AI models, including OpenAI’s GPT-4 — though, to be fair, none that were particularly cutting-edge — to judge if a patient should visit their doctor, get lab work done, or not come in at all. The numbers were striking: overall, the AI tools were seven to nine percent more likely to recommend patients not to seek medical care at all when reading complaints with imperfect — but arguably more realistic — language.

“Adding additional information, even if true and relevant, often reduced the accuracy of models,” Paul Hager, a researcher at the Technical University of Munich who was not involved in the study, told the Globe. “This is a complex issue that I think is being addressed to some degree through more advanced reasoning models… but there is little research on how to solve it on a more fundamental level.”

That the bots are woefully inaccurate isn’t surprising. Hallucinations, instances of a chatbot generating misinformation, have plagued the AI industry since the very beginning and may even be getting worse. But in what might be the clearest sign that the tech is also reinforcing existing biases in a medical scenario, the tested AI tools disproportionately gave incorrect advice to women specifically.

“The model reduced care more in the patients it thought were female,” Ghassemi told the Globe.

Women’s medical complaints have long been downplayed by predominantly male doctors who write them off as being too emotional — and not that long ago, as suffering from a female-exclusive “hysteria.” What stood out to Ghasemi was that the AI could correctly identify a patient as being a woman — even when they excised all references to gender in the complaints.

“It is kind of amazing,” she told the newspaper. “It’s a little scary.”

Ghassemi and her team’s findings pair unsettlingly with another recent study, published in Lancet Gastroenterology and Hepatology, which found that doctors who became hooked on AI tools saw their ability to spot precancerous growths notably decline after the AI tools were taken away.

In other words, the AI seemed to atrophy the doctors’ ability and made them worse at their job — a phenomenon called “deskilling.”

“If I lose the skills, how am I going to spot the errors?” Omer Ahmad, a gastroenterologist at University College Hospital London, asked in a recent interview with the New York Times. “We give AI inputs that affect its output, but it also it seems to affect our behavior as well.”

Circling back to Ghassemi’s work, if doctors embrace using AI tools to parse their patient’s complaints, then they’re at risk of losing one of the most fundamentally human skills that their job demands: knowing how to talk and connect with the people whose wellbeing depends on them.

This also has huge implications for the many people that seek out medical advice from a chatbot directly. We shudder to think of all the users out there that’re being told by ChatGPT not to see a doctor because of a typo in their prompt.

But if we can’t stop the tech from being adopted, we should demand more stringent standards. Ghassemi published previous research, the Globe noted, showing that AIs could detect race and would respond to Asian and Black users with less empathy.

“We need regulation that makes equity a mandatory performance standard for clinical AI,” she told the paper. “You have to train on diverse, representative data sets.”

More on medical AI: Something Extremely Scary Happens When Advanced AI Tries to Give Medical Advice to Real World Patients

Unlock the Secrets of Ethical Hacking!