Exploring the role of DeepSeek-R1, ChatGPT-4, and Google Gemini in medical education: How valid and reliable are they?

Article Information

Title: Exploring the role of DeepSeek-R1, ChatGPT-4, and Google Gemini in medical education: How valid and reliable are they?

Authors: Sultan Ayoub Meo, Farah A Abukhalaf, Riham A ElToukhy, Kamran Sattar

Journal: Pakistan Journal of Medical & Cardiological Review (PJMS)

HEC Recognition History

Category	From	To
Y	2024-10-01	2025-12-31

Publisher: Intellect Educational Research Explorers

Country: Pakistan

Year: 2025

Volume: 41

Issue: 7

Language: en

DOI: 10.12669/pjms.41.7.12183

Keywords: KnowledgeChatGPTMultiple choice questionsGoogle GeminiDeepseek

Abstract

Objective: In recent years, Artificial Intelligence (AI) has led to rapid advancements in science, technology, industries, healthcare settings, and medical education. A Chinese-built large language model, DeepSeek-R1, inspires the scientific community as an affordable and open alternative to earlier established US-based AI models, ChatGPT-4 and Google Gemini 1.5 Pro. This study aimed to explore the role of “DeepSeek-R1, ChatGPT-4 and Google Gemini 1.5 Pro” and to assess the validity and reliability of these AI tools in medical education.
Methods: The current cross-sectional study was performed in the Department of Physiology, College of Medicine, King Saud University, Riyadh, Saudi Arabia during the period January 25, 2025, to February 28, 2025. The Multiple-Choice Questions (MCQs) bank was created with a pool of basic medical sciences (60 MCQs) and clinical medical sciences (40 MCQs). The one hundred MCQs were prepared from various medical textbooks, journals, and examination pools. The MCQs were individually entered into the given area of the “DeepSeek-R1, ChatGPT-4 and Google Gemini 1.5 Pro” to assess the level of knowledge in various disciplines of medical sciences.
Results: The marks obtained in basic medical sciences by DeepSeek R1 47/60 (78.33%), ChatGPT-4 47/60 (78.33%), and Google Gemini 1.5 Pro 49/60 (81.7%). However, in clinical medical sciences, the marks obtained by DeepSeek R1 were 35/40 (87.5%), ChatGPT-4 36/40 (90.0%), and Google Gemini 1.5 Pro 33/40 (82.5%). The total marks obtained by DeepSeekR1 were 82/100 (82.0%), Chat GPT-4 84/100 (84.0%), and Google Gemini-1.5 Pro 82/100 (82.0%).
Conclusions: The Chinese-based DeepSeek-R1, the US-based ChatGPT-4, and Google Gemini-1.5 Pro achieved similar scores, exceeding 80% marks, in various medical sciences subjects. The study findings demonstrate that the knowledge, validity, and reliability levels of DeepSeek R1, ChatGPT-4, and Google Gemini 1.5 Pro are similar for their potential future use in medical education.
doi: https://doi.org/10.12669/pjms.41.7.12183
How to cite this: Meo SA, Abukhalaf FA, ElToukhy RA, Sattar K. Exploring the role of DeepSeek-R1, ChatGPT-4, and Google Gemini in medical education: How valid and reliable are they? Pak J Med Sci. 2025;41(7):1887-1892. doi: https://doi.org/10.12669/pjms.41.7.12183
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Paper summary is not available for this article yet.

Loading PDF...

Loading Statistics...

DefinePK

Exploring the role of DeepSeek-R1, ChatGPT-4, and Google Gemini in medical education: How valid and reliable are they?

Article Information

HEC Recognition History

Categories

Abstract

DefinePK

Select Collection

Exploring the role of DeepSeek-R1, ChatGPT-4, and Google Gemini in medical education: How valid and reliable are they?

Article Information

HEC Recognition History

Categories

Abstract