Feb 13, 2023 I Paul Seaburn

ChatGPT Almost Passes the US Medical Licensing Exam

Are you one of those millions (perhaps even billions) of people who fear losing their jobs – possibly soon – to artificial intelligence? If you are a medical doctor, you probably aren’t. That may be a big mistake. A study published this week in a major medical journal revealed that ChatGPT – the chatbot many are rapidly growing to hate and fear just a few short months after its release – took the United States Medical Licensing Examination (USMLE) … and came very close to passing it. Is the arrival of the Emergency Medical Hologram (EMH) – better known as The Doctor on the television series "Star Trek: Voyager" – just around the corner? Will your medical insurance cover a ChatGPT exam?

Meet the new doc ... not the same as the old doc.

“The goal of the USMLE exam is to assess “a physician’s ability to apply knowledge, concepts, and principles, and to demonstrate fundamental patient-centered skills that are important in health and disease and that constitute the basis of safe and effective patient care.”

PLOS Digital Health opens its announcement of the study by Tiffany Kung, Victor Tseng, and colleagues at AnsibleHealth by introducing the public to the United States Medical Licensing Examination. Set up in the early 1990s, the USMLE replaced multiple examinations with a standard three-step examination program for medical licensure in the United States. Physicians with a Doctor of Medicine (MD) degree are required to pass the USMLE for medical licensure. The three steps are:

  • Assesses foundational medical science typically obtained during the first two years of medical school
  • Evaluates the applicant's knowledge of clinical medicine
  • Assesses the application of clinical knowledge to patient management

The USMLE is pass/fail and over 95% of U.S. medical school graduates pass the first time they take it with a passing grade of 60% or better. ChatGPT would throw off the curve by just missing that bar - except it didn’t graduate from medical school. Despite that – as well as not studying for at least 300 to 400 hours as most USMLE test takers do – the ChatGPT did well in answering all 350 questions. (26 image-based questions were removed since ChatGPT couldn’t see them.) The questions were presented in various formats including open-ended prompting ("What would be the patient's diagnosis based on the information provided?") and multiple choice ("The patient's condition is mostly caused by which of the following pathogens?") So, how did ChatGPT do on the USMLE?

"ChatGPT performed at or near the passing threshold for all three exams without any specialized training or reinforcement. Additionally, ChatGPT demonstrated a high level of concordance and insight in its explanations."

“At or near passing threshold” means ChatGPT scored between 52.4% and 75.0% across the three USMLE exams, with the passing threshold at 60%. ChatGPT also demonstrated 94.6% consistency with its responses and -- here’s a scary one -- produced at least one significant insight (something that was new, non-obvious, and clinically valid) for 88.9% of its responses. That’s right … ChatGPT wrote some things that made the test evaluators go “Hmmm.” While it was close to its human competition, it blew away its AI counterpart - PubMedGPT, a model trained exclusively on biomedical domain literature, scored only 50.8% on an older set of USMLE-style questions, while ChatGPT nearly passed what was basically the full exam.

“Reaching the passing score for this notoriously difficult expert exam, and doing so without any human reinforcement, marks a notable milestone in clinical AI maturation.”

The authors of the study were obviously impressed with ChatGPT’s performance and saw it becoming useful in medical education, and yes, in clinical practice. In fact, they noted that clinicians at AnsibleHealth already use ChatGPT to help simplify medical terminology for patients. And then came the big confession from Dr. Kung:

"ChatGPT contributed substantially to the writing of [our] manuscript... We interacted with ChatGPT much like a colleague, asking it to synthesize, simplify, and offer counterpoints to drafts in progress...All of the co-authors valued ChatGPT's input."

That’s right … the authors of the study on ChatGPT used ChatGPT to write the study! Isn’t that cheating? Simon McCallum, a senior lecturer in software engineering at Victoria University of Wellington, New Zealand, puts it this way in a interview with Barrons:

"Society is about to change, and instead of warning about the hypochondria of randomly searching the internet for symptoms, we may soon get our medical advice from Doctor Google or Nurse Bing."

He also notes that there are other AI medical tools being tested.  One called Med-PaLM impressed him: “ChatGPT may pass the exam, but Med-PaLM is able to give advice to patients that is as good as a professional GP." Considering many people no longer trust their GP (general practitioner), that may not be a positive endorsement. Another study published recently on arXiv used had another large language model, Flan-PaLM, take the USMLE. While ChatGPT is an open model, Flan-PaLM was heavily modified using a collection of medical question-answering databases called the MultiMedQA. As a result, Flan-PaLM achieved 67.6% accuracy in answering the USMLE questions, a passing grade that was higher than both ChatGPT and PubMed GPT. Vivek Natarajan, one of the co-authors of that study, concludes that all of these AI models "present a significant opportunity to rethink the development of medical AI and make it easier, safer and more equitable to use."

I recommend seeing a ChatGPT specialist.

This implies that they are not there yet, and many healthcare professionals – and educators in general – are concerned that ChatGPT is already being listed as an author on research papers. Natarajan tries to allay fears by stating that, while researchers are definitely planning for the day when AI becomes part of the medical team, today’s experiments have lesser goals:

“(To) spark further conversations and collaborations between patients, consumers, AI researchers, clinicians, social scientists, ethicists, policymakers and other interested people in order to responsibly translate these early research findings to improve healthcare."

If ChatGPT is already being listed as a co-author on medical research papers and is contributing to evaluations on its own performance, hasn't that train already left the station? Do yourself a favor and do what Natarajan suggests – start a conversation with your friends and your doctor and other medical providers about using AI in diagnosis and treatments … before you no longer have a choice and you  hear this as you are wheeling into the operating room: .

“Calling Dr. Google! Calling Dr. Google!”

Paul Seaburn

Paul Seaburn is the editor at Mysterious Universe and its most prolific writer. He’s written for TV shows such as "The Tonight Show", "Politically Incorrect" and an award-winning children’s program. He's been published in “The New York Times" and "Huffington Post” and has co-authored numerous collections of trivia, puzzles and humor. His “What in the World!” podcast is a fun look at the latest weird and paranormal news, strange sports stories and odd trivia. Paul likes to add a bit of humor to each MU post he crafts. After all, the mysterious doesn't always have to be serious.

Join MU Plus+ and get exclusive shows and extensions & much more! Subscribe Today!