Assessment of Nursing Skill and Knowledge of ChatGPT, Gemini, Microsoft Copilot, and Llama: A Comparative Study

Dilan S. Hiwa; Sarhang Sedeeq Abdalla; Aso S. Muhialdeen; Hussein M. Hamasalih; Sanaa O. Karim

doi:10.58742/bmj.v2i2.87

Original Articles

Assessment of Nursing Skill and Knowledge of ChatGPT, Gemini, Microsoft Copilot, and Llama: A Comparative Study

Dilan S. Hiwa ,
Sarhang Sedeeq Abdalla,
Aso S. Muhialdeen,
Hussein M. Hamasalih,
Sanaa O. Karim

Author details

Dilan S. Hiwa

College of Medicine, University of Sulaimani, Madam Mitterrand Street, Sulaymaniyah, Kurdistan, Iraq

Sarhang Sedeeq Abdalla

Smart Health Tower, Madam Mitterrand Street, Sulaymaniyah, Kurdistan, Iraq

Aso S. Muhialdeen

Smart Health Tower, Madam Mitterrand Street, Sulaymaniyah, Kurdistan, Iraq

Hussein M. Hamasalih

College of Nursing, University of Sulaimani, Sulaymaniyah, Kurdistan, Iraq

Sanaa O. Karim

College of Nursing, University of Sulaimani, Madam Mitterrand Street, Sulaymaniyah, Kurdistan, Iraq

Abstract

875

Downloads

3659

Issue: Volume 2, Issue 3, 2024

Published: May 7, 2024

Download:

PDF

Supplementary

PDF

Supplementary

Abstract

Introduction

Artificial intelligence (AI) has emerged as a transformative force in healthcare. This study assesses the performance of advanced AI systems—ChatGPT-3.5, Gemini, Microsoft Copilot, and Llama 2—in a comprehensive 100-question nursing competency examination. The objective is to gauge their potential contributions to nursing healthcare education and future potential implications.

Methods

The study tested four AI systems (ChatGPT 3.5, Gemini, Microsoft Copilot, Llama 2) with a 100-question nursing exam in February of 2024. A standardized protocol was employed to administer the examination, covering diverse nursing competencies. Questions derived from reputable clinical manuals ensured content reliability. The AI systems underwent evaluation based on accuracy rates.

Results

Microsoft Copilot demonstrated the highest accuracy at 84%, followed by ChatGPT 3.5 (77%), Gemini (75%), and Llama 2 (68%). None achieved complete accuracy on all questions. Each of the AI systems has answered at least one question that only they got correctly.

Conclusion

The variations in AI answers underscore the significance of selecting appropriate AI systems based on specific application requirements and domains, as no singular AI system consistently surpassed others in every aspect of nursing knowledge.

References

Kuzucu 1. Ray PP. ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope. Internet of Things and Cyber-Physical Systems. 2023. doi:10.1016/j.iotcps.2023.04.003
Ahamed ZM, Dhahir HM, Mohammed MM, Ali R, Hassan SH, Muhialdeen AS, Saeed YA, Fatah ML, Qaradakhy AJ, Ali RM, Ahmed SF. Comparative Analysis of ChatGPT and Human Decision-Making in Thyroid and Neck Swellings: A Case-Based Study. Barw Medical Journal. 2023;1(4):2-6. doi:10.58742/bmj.v1i2.43
Masalkhi M, Ong J, Waisberg E, Lee AG. Google DeepMind’s gemini AI versus ChatGPT: a comparative analysis in ophthalmology. Eye. 2024 14:1-6. doi:10.1038/s41433-024-02958-w
Abbas YN, Hassan HA, Hamad DQ, Hasan SJ, Omer DA, Kakamad SH, et al. Role of ChatGPT and Google Bard in the Diagnosis of Psychiatric Disorders: A Cross Sectional Study. Barw Medical Journal. 2023;1(4):14-19. doi:10.58742/4vd6h741
Semeraro F, Gamberini L, Carmona F, Monsieurs KG. Clinical questions on advanced life support answered by artificial intelligence. A comparison between ChatGPT, Google Bard and Microsoft Copilot. Resuscitation. 2024 1;195. doi:10.1016/j.resuscitation.2024.110114
Touvron H, Martin L, Stone K, Albert P, Almahairi A, Babaei Y, et al. Llama 2: Open foundation and fine-tuned chat models. arXiv:2307.09288. 2023 18. doi:10.48550/arXiv.2307.09288
Lister S, Hofland J, Grafton H, Wilson C. The Royal Marsden Manual of Clinical Nursing Procedures, Student Edition. Google Books. John Wiley & Sons; 2021. https://books.google.iq/books?
Freedman R, Herbert L, O’Donnell A, Ross N. Oxford Handbook of Anaesthesia. Oxford University Press; 2022. doi:10.1177/0310057X221134636
Muhialdeen AS, Ahmed JO, Baba HO, Abdullah IY, Hassan HA, Najar KA, Mikael TM, Mustafa MQ, Mohammed DA, Omer DA, Bapir R. Kscien’s List; A New Strategy to Discourage Predatory Journals and Publishers (Second Version). Barw Medical Journal. 2023 1. doi:10.58742/bmj.v1i1.14
Javaid M, Haleem A, Singh RP. ChatGPT for healthcare services: An emerging stage for an innovative perspective. BenchCouncil Transactions on Benchmarks, Standards and Evaluations. 2023 1;3(1):100105. doi:10.1016/j.tbench.2023.100105
Miao H, Ahn H. Impact of ChatGPT on interdisciplinary nursing education and research. Asian/Pacific Island Nursing Journal. 2023 24;7(1): e48136. doi:10.2196/48136
Sauerbrei A, Kerasidou A, Lucivero F, Hallowell N. The impact of artificial intelligence on the person-centred, doctor-patient relationship: some problems and solutions. BMC Medical Informatics and Decision Making. 2023;23(1):1-4. doi:10.1186/s12911-023-02162-y
De Gagne JC. The State of Artificial Intelligence in Nursing Education: Past, Present, and Future Directions. International Journal of Environmental Research and Public Health. 2023 10;20(6):4884. doi:10.3390/ijerph20064884
Yelne S, Chaudhary M, Dod K, Sayyad A, Sharma R. Harnessing the Power of AI: A Comprehensive Review of Its Impact and Challenges in Nursing Science and Healthcare. Cureus. 2023 22;15(11). doi:10.7759/cureus.49252
Hirosawa T, Mizuta K, Harada Y, Shimizu T. Comparative evaluation of diagnostic accuracy between Google Bard and physicians. The American Journal of Medicine. 2023 1;136(11):1119-23. doi:10.1016/j.amjmed.2023.08.003
Taira K, Itaya T, Hanada A. Performance of the large language model ChatGPT on the National Nurse Examinations in Japan: evaluation study. JMIR nursing. 2023; 6: e47305. doi:10.2196/47305
Gan RK, Ogbodo JC, Wee YZ, Gan AZ, González PA. Performance of Google bard and ChatGPT in mass casualty incidents triage. The American journal of emergency medicine. 2024 1; 75:72-8. doi:10.1016/j.ajem.2023.10.034
Muhialdeen AS, Mohammed SA, Ahmed NH, Ahmed SF, Hassan WN, Asaad HR, et al. Artificial Intelligence in Medicine: A Comparative Study of ChatGPT and Google Bard in Clinical Diagnostics. Barw Medical Journal. 2023;1(4):7-13. doi:10.58742/pry94q89

Keywords

DOI:

https://doi.org/10.58742/bmj.v2i2.87

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Assessment of Nursing Skill and Knowledge of ChatGPT, Gemini, Microsoft Copilot, and Llama: A Comparative Study

Abstract

References

Related Articles

Send mail to Author