Assessment of Nursing Skill and Knowledge of ChatGPT, Gemini, Microsoft Copilot, and Llama: A Comparative Study

Main Article Content

Dilan S. Hiwa
Sarhang Sedeeq Abdalla
Aso S. Muhialdeen
Hussein M. Hamasalih
Sanaa O. Karim

Keywords

MCQ, Artificial intelligence , Nursing, AI

Abstract

Introduction


Artificial intelligence (AI) has emerged as a transformative force in healthcare. This study assesses the performance of advanced AI systems—ChatGPT-3.5, Gemini, Microsoft Copilot, and Llama 2—in a comprehensive 100-question nursing competency examination. The objective is to gauge their potential contributions to nursing healthcare education and future potential implications.


Methods


The study tested four AI systems (ChatGPT 3.5, Gemini, Microsoft Copilot, Llama 2) with a 100-question nursing exam in February of 2024. A standardized protocol was employed to administer the examination, covering diverse nursing competencies. Questions derived from reputable clinical manuals ensured content reliability. The AI systems underwent evaluation based on accuracy rates.


Results


Microsoft Copilot demonstrated the highest accuracy at 84%, followed by ChatGPT 3.5 (77%), Gemini (75%), and Llama 2 (68%). None achieved complete accuracy on all questions. Each of the AI systems has answered at least one question that only they got correctly.


Conclusion


The variations in AI answers underscore the significance of selecting appropriate AI systems based on specific application requirements and domains, as no singular AI system consistently surpassed others in every aspect of nursing knowledge.

Abstract 235 | PDF Downloads 128

References

1. Kuzucu 1. Ray PP. ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope. Internet of Things and Cyber-Physical Systems. 2023. doi:10.1016/j.iotcps.2023.04.003
2. 2. Ahamed ZM, Dhahir HM, Mohammed MM, Ali R, Hassan SH, Muhialdeen AS, Saeed YA, Fatah ML, Qaradakhy AJ, Ali RM, Ahmed SF. Comparative Analysis of ChatGPT and Human Decision-Making in Thyroid and Neck Swellings: A Case-Based Study. Barw Medical Journal. 2023;1(4):2-6. doi:10.58742/bmj.v1i2.43
3. 3. Masalkhi M, Ong J, Waisberg E, Lee AG. Google DeepMind’s gemini AI versus ChatGPT: a comparative analysis in ophthalmology. Eye. 2024 14:1-6. doi:10.1038/s41433-024-02958-w
4. 4. Abbas YN, Hassan HA, Hamad DQ, Hasan SJ, Omer DA, Kakamad SH, et al. Role of ChatGPT and Google Bard in the Diagnosis of Psychiatric Disorders: A Cross Sectional Study. Barw Medical Journal. 2023;1(4):14-19. doi:10.58742/4vd6h741
5. 5. Semeraro F, Gamberini L, Carmona F, Monsieurs KG. Clinical questions on advanced life support answered by artificial intelligence. A comparison between ChatGPT, Google Bard and Microsoft Copilot. Resuscitation. 2024 1;195. doi:10.1016/j.resuscitation.2024.110114
6. 6. Touvron H, Martin L, Stone K, Albert P, Almahairi A, Babaei Y, et al. Llama 2: Open foundation and fine-tuned chat models. arXiv:2307.09288. 2023 18. doi:10.48550/arXiv.2307.09288
7. 7. Lister S, Hofland J, Grafton H, Wilson C. The Royal Marsden Manual of Clinical Nursing Procedures, Student Edition. Google Books. John Wiley & Sons; 2021. https://books.google.iq/books?
8. 8. Freedman R, Herbert L, O’Donnell A, Ross N. Oxford Handbook of Anaesthesia. Oxford University Press; 2022. doi:10.1177/0310057X221134636
9. 9. Muhialdeen AS, Ahmed JO, Baba HO, Abdullah IY, Hassan HA, Najar KA, Mikael TM, Mustafa MQ, Mohammed DA, Omer DA, Bapir R. Kscien’s List; A New Strategy to Discourage Predatory Journals and Publishers (Second Version). Barw Medical Journal. 2023 1. doi:10.58742/bmj.v1i1.14
10. 10. Javaid M, Haleem A, Singh RP. ChatGPT for healthcare services: An emerging stage for an innovative perspective. BenchCouncil Transactions on Benchmarks, Standards and Evaluations. 2023 1;3(1):100105. doi:10.1016/j.tbench.2023.100105
11. 11. Miao H, Ahn H. Impact of ChatGPT on interdisciplinary nursing education and research. Asian/Pacific Island Nursing Journal. 2023 24;7(1): e48136. doi:10.2196/48136
12. 12. Sauerbrei A, Kerasidou A, Lucivero F, Hallowell N. The impact of artificial intelligence on the person-centred, doctor-patient relationship: some problems and solutions. BMC Medical Informatics and Decision Making. 2023;23(1):1-4. doi:10.1186/s12911-023-02162-y
13. 13. De Gagne JC. The State of Artificial Intelligence in Nursing Education: Past, Present, and Future Directions. International Journal of Environmental Research and Public Health. 2023 10;20(6):4884. doi:10.3390/ijerph20064884
14. 14. Yelne S, Chaudhary M, Dod K, Sayyad A, Sharma R. Harnessing the Power of AI: A Comprehensive Review of Its Impact and Challenges in Nursing Science and Healthcare. Cureus. 2023 22;15(11). doi:10.7759/cureus.49252
15. 15. Hirosawa T, Mizuta K, Harada Y, Shimizu T. Comparative evaluation of diagnostic accuracy between Google Bard and physicians. The American Journal of Medicine. 2023 1;136(11):1119-23. doi:10.1016/j.amjmed.2023.08.003
16. 16. Taira K, Itaya T, Hanada A. Performance of the large language model ChatGPT on the National Nurse Examinations in Japan: evaluation study. JMIR nursing. 2023; 6: e47305. doi:10.2196/47305
17. 17. Gan RK, Ogbodo JC, Wee YZ, Gan AZ, González PA. Performance of Google bard and ChatGPT in mass casualty incidents triage. The American journal of emergency medicine. 2024 1; 75:72-8. doi:10.1016/j.ajem.2023.10.034
18. 18. Muhialdeen AS, Mohammed SA, Ahmed NH, Ahmed SF, Hassan WN, Asaad HR, et al. Artificial Intelligence in Medicine: A Comparative Study of ChatGPT and Google Bard in Clinical Diagnostics. Barw Medical Journal. 2023;1(4):7-13. doi:10.58742/pry94q89