Development and evaluation of an artificial intelligence-based model for assessment improvement in Brazil's national online Continuing Medical Education program

Main Article Content

Alisson Oliveira dos Santos
Tales Mota Machado
Josué de Lacerda Silva
João Paulo Valadares Vilaça
Henrique Pereira Alves
Moreno Magalhães de Souza Rodrigues
Leonardo Cançado Monteiro Savassi
Adelson Guaraci Jantsch
Alysson Feliciano Lemos

Abstract

Background:


The challenge of providing timely, high-quality feedback to thousands of healthcare professionals enrolled in distance-based Family Medicine specialization programs in Brazil creates an opportunity for artificial intelligence implementation. This study aimed to evaluate the effectiveness of Large Language Models (LLMs) in assisting tutors with student assessment in these programs.


Methods:


We implemented GPT-4o to analyze student responses to practical challenges in a Family Medicine distance education course. The system was structured through dataset preparation (518 student responses), prompt engineering, fine-tuning, and Retrieval-Augmented Generation. Evaluation included: human expert assessment using a 5-item Likert questionnaire (n=26 responses); metrics-based analysis comparing text length between LLM and tutor feedback (n=104); semantic similarity analysis between tutor- and LLM-generated texts (n=11); and comparison of scores assigned by tutors versus LLM (n=104).


Results:


Expert assessment showed high ratings for clarity (100% scoring "strongly agree") but lower scores regarding LLM's ability to replace tutors. LLM-generated feedback was significantly longer than tutors' (mean 190.11 vs. 109.69 words, p<.001). Semantic similarity between LLM and tutor responses was high (mean 85.92%). LLM-assigned scores differed slightly but significantly from tutor scores (mean 8.31 vs. 8.80, p<.001).


Discussion:


LLMs can generate clear, semantically aligned feedback and assign grades that approximate tutor scoring, offering a scalable enhancement to assessment in distance‑based medical education. Nevertheless, they should be seen as a complement to human tutors rather than a replacement, especially where nuanced, contextualized guidance is required. Careful attention to regional language variation and domain‑specific content will be essential for the safe, equitable integration of AI into continuing professional development.

Downloads

Download data is not yet available.

Article Details

How to Cite
Oliveira dos Santos, A., Mota Machado, T., de Lacerda Silva, J., Valadares Vilaça, J. P., Pereira Alves, H., Magalhães de Souza Rodrigues, M., … Feliciano Lemos, A. (2026). Development and evaluation of an artificial intelligence-based model for assessment improvement in Brazil’s national online Continuing Medical Education program. Education for Health, 39(1). Retrieved from https://educationforhealthjournal.org/index.php/efh/article/view/441
Section
Original Research Paper