DefinePK hosts the largest index of Pakistani journals, research articles, news headlines, and videos. It also offers chapter-level book search.
Title: Exploring the role of DeepSeek-R1, ChatGPT-4, and Google Gemini in medical education: How valid and reliable are they?
Authors: Sultan Ayoub Meo, Farah A Abukhalaf, Riham A ElToukhy, Kamran Sattar
Journal: Pakistan Journal of Medical & Cardiological Review (PJMS)
| Category | From | To |
|---|---|---|
| Y | 2024-10-01 | 2025-12-31 |
Publisher: Intellect Educational Research Explorers
Country: Pakistan
Year: 2025
Volume: 41
Issue: 7
Language: en
Keywords: KnowledgeChatGPTMultiple choice questionsGoogle GeminiDeepseek
Objective: In recent years, Artificial Intelligence (AI) has led to rapid advancements in science, technology, industries, healthcare settings, and medical education. A Chinese-built large language model, DeepSeek-R1, inspires the scientific community as an affordable and open alternative to earlier established US-based AI models, ChatGPT-4 and Google Gemini 1.5 Pro. This study aimed to explore the role of “DeepSeek-R1, ChatGPT-4 and Google Gemini 1.5 Pro” and to assess the validity and reliability of these AI tools in medical education.
Methods: The current cross-sectional study was performed in the Department of Physiology, College of Medicine, King Saud University, Riyadh, Saudi Arabia during the period January 25, 2025, to February 28, 2025. The Multiple-Choice Questions (MCQs) bank was created with a pool of basic medical sciences (60 MCQs) and clinical medical sciences (40 MCQs). The one hundred MCQs were prepared from various medical textbooks, journals, and examination pools. The MCQs were individually entered into the given area of the “DeepSeek-R1, ChatGPT-4 and Google Gemini 1.5 Pro” to assess the level of knowledge in various disciplines of medical sciences.
Results: The marks obtained in basic medical sciences by DeepSeek R1 47/60 (78.33%), ChatGPT-4 47/60 (78.33%), and Google Gemini 1.5 Pro 49/60 (81.7%). However, in clinical medical sciences, the marks obtained by DeepSeek R1 were 35/40 (87.5%), ChatGPT-4 36/40 (90.0%), and Google Gemini 1.5 Pro 33/40 (82.5%). The total marks obtained by DeepSeekR1 were 82/100 (82.0%), Chat GPT-4 84/100 (84.0%), and Google Gemini-1.5 Pro 82/100 (82.0%).
Conclusions: The Chinese-based DeepSeek-R1, the US-based ChatGPT-4, and Google Gemini-1.5 Pro achieved similar scores, exceeding 80% marks, in various medical sciences subjects. The study findings demonstrate that the knowledge, validity, and reliability levels of DeepSeek R1, ChatGPT-4, and Google Gemini 1.5 Pro are similar for their potential future use in medical education.
doi: https://doi.org/10.12669/pjms.41.7.12183
How to cite this: Meo SA, Abukhalaf FA, ElToukhy RA, Sattar K. Exploring the role of DeepSeek-R1, ChatGPT-4, and Google Gemini in medical education: How valid and reliable are they? Pak J Med Sci. 2025;41(7):1887-1892. doi: https://doi.org/10.12669/pjms.41.7.12183
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Loading PDF...
Loading Statistics...