Evaluating the evaluators: a comparative study of AI and Teacher Assessments in Higher Education

Authors

DOI:

https://doi.org/10.1344/der.2024.45.124-140

Keywords:

Artificial intelligence tool-based assessment systems, teacher evaluation, assessments in higher education

Abstract

This study aims to examine the potential differences between teacher evaluations and artificial intelligence (AI) tool-based assessment systems in university examinations. The research has evaluated a wide spectrum of exams including numerical and verbal course exams, exams with different assessment styles (project, test exam, traditional exam), and both theoretical and practical course exams. These exams were selected using a criterion sampling method and were analyzed using Bland-Altman Analysis and Intraclass Correlation Coefficient (ICC) analyses to assess how AI and teacher evaluations performed across a broad range. The research findings indicate that while there is a high level of proficiency between the total exam scores assessed by artificial intelligence and teacher evaluations; medium consistency was found in the evaluation of visually-based exams, low consistency in video exams, high consistency in test exams, and low consistency in traditional exams. This research is crucial as it helps to identify specific areas where artificial intelligence can either complement or needs improvement in educational assessment, guiding the development of more accurate and fair evaluation tools.

References

Amin, A. (2020). A face recognition system based on deep learning (frdls) to support the entry and supervision procedures on electronic exams. International Journal of Intelligent Computing and Information Sciences, 20(1), 40-50. https://doi.org/10.21608/ijicis.2020.23149.1015

Babbie, E. R. (2016). The Practice of Social research. Nelson Education.

Bai, X. and Stede, M. (2022). A survey of current machine learning approaches to student free-text evaluation for intelligent tutoring. International Journal of Artificial Intelligence in Education, 33(4), 992-1030. https://doi.org/10.1007/s40593-022-00323-0

Becher, T., & Trowler, P. R. (2001). Academic tribes and territories: Intellectual enquiry and the culture of disciplines. Open University Press.

Bland, J. M., & Altman, D. G. (1986). Statistical methods for assessing agreement between two methods of clinical measurement. The Lancet, 327(8476), 307-310.

Brookhart, S. M. (2008). How to give effective feedback to your students. Association for Supervision and Curriculum Development.

Cañada, J., Sanguino, T., Merelo, J., & Santos, V. (2014). Open classroom: enhancing student achievement on artificial intelligence through an international online competition. Journal of Computer Assisted Learning, 31(1), 14-31. https://doi.org/10.1111/jcal.12075

Celik, I., Dindar, M., Muukkonen, H., & Järvelä, S. (2022). The promises and challenges of artificial intelligence for teachers: A systematic review of research. TechTrends, 66(4), 616–630. https://doi.org/10.1007/s11528-022-00715-y

Chen, J., Lai, P. P. Y., Chan, A., Man, V., & Chan, C. (2022). Ai-assisted enhancement of student presentation skills: challenges and opportunities. Sustainability, 15(1), 196. https://doi.org/10.3390/su15010196

Chen, L., Chen, P., & Lin, Z. (2020). Artificial intelligence in education: a review. Ieee Access, 8, 75264-75278. https://doi.org/10.1109/access.2020.2988510

Chen, L., Chen, P., & Lin, Z. (2020). Artificial intelligence in education: a review. IEEE Access, 8, 75264-75278. https://doi.org/10.1109/access.2020.2988510

Choi, Y. and McClenen, C. (2020). Development of adaptive formative assessment system using computerized adaptive testing and dynamic bayesian networks. Applied Sciences, 10(22), 8196. https://doi.org/10.3390/app10228196

Collins, G., Dhiman, P., Navarro, C., Ma, J., Hooft, L., Reitsma, J., … & Moons, K. (2021). Protocol for development of a reporting guideline (tripod-ai) and risk of bias tool (probast-ai) for diagnostic and prognostic prediction model studies based on artificial intelligence. BMJ Open, 11(7), e048008. https://doi.org/10.1136/bmjopen-2020-048008

Creswell, J. W., & Creswell, J. D. (2018). Research design: Qualitative, quantitative, and mixed methods approaches. Sage Publications.

Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297-334.

Dermeval, D., Paiva, R., Bittencourt, I., Vassileva, J., & Borges, D. (2017). Authoring tools for designing intelligent tutoring systems: a systematic review of the literature. International Journal of Artificial Intelligence in Education, 28(3), 336-384. https://doi.org/10.1007/s40593-017-0157-9

Dindar, M., Muukkonen, H., & Järvelä, S. (2022). The promises and challenges of artificial intelligence for teachers: a systematic review of research. Techtrends, 66(4), 616-630. https://doi.org/10.1007/s11528-022-00715-y

Dwivedi, Y., Hughes, L., Ismagilova, E., Aarts, G., Coombs, C., Crick, T., … & Williams, M. (2021). Artificial intelligence (ai): multidisciplinary perspectives on emerging challenges, opportunities, and agenda for research, practice and policy. International Journal of Information Management, 57, 101994. https://doi.org/10.1016/j.ijinfomgt.2019.08.002

Dyment, J. E., & O'Connell, T. S. (2011). Assessing the quality of reflection in student journals: A review of the research. Teaching in Higher Education, 16(1), 81-97.

Ed-Driouch, C., Gourraud, P., Dumas, C., & Mars, F. (2022). The integration of human intelligence into artificial intelligence to provide medical practice-based predictions. HHAI2022: Augmenting Human Intellect. https://doi.org/10.3233/faia220221

Eisner, E. W. (2002). The arts and the creation of mind. Yale University Press.

Elder, H., Rieger, T., Canfield, C., Shank, D. B., & Hines, C. (2022). Knowing when to pass: the effect of ai reliability in risky decision contexts. Human Factors: The Journal of the Human Factors and Ergonomics Society, 66(2), 348-362. https://doi.org/10.1177/00187208221100691

Fiske, A., Henningsen, P., & Buyx, A. (2019). Your robot therapist will see you now: ethical implications of embodied artificial intelligence in psychiatry, psychology, and psychotherapy. Journal of Medical Internet Research, 21(5), e13216. https://doi.org/10.2196/13216

Freeman, S., Eddy, S. L., McDonough, M., Smith, M. K., Okoroafor, N., Jordt, H., & Wenderoth, M. P. (2014). Active learning increases student performance in science, engineering, and mathematics. Proceedings of the National Academy of Sciences, 111(23), 8410-8415.

Giavarina, D. (2015). Understanding Bland Altman analysis. Biochemia Medica, 25(2), 141-151.

González-Calatayud, V., Prendes-Espinosa, P., & Roig-Vila, R. (2021). Artificial intelligence for student assessment: A systematic review. Applied Sciences, 11(12), 5467. https://doi.org/10.3390/app11125467

Gunning, D., Stefik, M., Choi, J., Miller, T., Stumpf, S., & Yang, G. (2019). Xai—explainable artificial intelligence. Science Robotics, 4(37). https://doi.org/10.1126/scirobotics.aay712,

Güven, H. and Güven, E. (2023). Use of artificial intelligence applications in e-commerce. International Journal of Management and Administration, 7(13), 69-94. https://doi.org/10.29064/ijma.1194949

Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied Measurement in Education, 15(3), 309-334.

Harry, A. (2023). Role of ai in education. Interdiciplinary Journal and Hummanity (Injurity), 2(3), 260-268. https://doi.org/10.58631/injurity.v2i3.52

Haseski, H. (2019). What do turkish pre-service teachers think about artificial intelligence?. International Journal of Computer Science Education in Schools, 3(2), 3-23. https://doi.org/10.21585/ijcses.v3i2.55

Hinojo-Lucena, F., Díaz, I., Reche, M., & Rodríguez, J. (2019). Artificial intelligence in higher education: a bibliometric study on its impact in the scientific literature. Education Sciences, 9(1), 51. https://doi.org/10.3390/educsci9010051

Hua, Y. (2022). Design of online music education system based on artificial intelligence and multiuser detection algorithm. Computational Intelligence and Neuroscience, 2022, 1-11. https://doi.org/10.1155/2022/9083436

Ishaaq, N. & Sohail, S. S. (2023). Re: investigating the impact of innovative ai chatbot on post‐pandemic medical education and clinical assistance: a comprehensive analysis. ANZ Journal of Surgery, 94(3), 494-494. https://doi.org/10.1111/ans.18721

Jingshan, H. (2023). Analysis of the Application of Artificial Intelligence in Education and Teaching. Advances in Educational Technology and Psychology, doi: 10.23977/aetp.2023.070210

Jobin, A. & Ienca, M. (2019). The global landscape of ai ethics guidelines. Nature Machine Intelligence, 1(9), 389-399. https://doi.org/10.1038/s42256-019-0088-

Kawaji, T., Hojo, S., Kushiyama, A., Nakatsuma, K., Kaneda, K., Kato, M., … & Sato, M. (2019). Limitations of lesion quality estimated by ablation index: an in vitro study. Journal of Cardiovascular Electrophysiology, 30(6), 926-933. https://doi.org/10.1111/jce.13928

Kazimov, T., Bayramova, T., & Malikova, N. (2021). Research of intelligent methods of software testing. System Research and Information Technologies, (4), 42-52. https://doi.org/10.20535/srit.2308-8893.2021.4.03

Keskinbora, K. & Güven, F. (2020). Artificial intelligence and ophthalmology. Turkish Journal of Ophthalmology, 50(1), 37-43. https://doi.org/10.4274/tjo.galenos.2020.78989

Koo, T. K., & Li, M. Y. (2016). A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of Chiropractic Medicine, 15(2), 155-163.

Köse, U. and Arslan, A. (2017). Optimization of self‐learning in computer engineering courses: an intelligent software system supported by artificial neural network and vortex optimization algorithm. Computer Applications in Engineering Education, 25(1), 142-156. https://doi.org/10.1002/cae.21787

Laupichler, M. C., Aster, A., Meyerheim, M., Raupach, T., & Mergen, M. (2024). Medical students’ ai literacy and attitudes towards ai: a cross-sectional two-center study using pre-validated assessment instruments. BMC Medical Education, 24(1). https://doi.org/10.1186/s12909-024-05400-

Liu, H., Liu, Z., Wu, Z., & Tang, J. (2020). Personalized multimodal feedback generation in education.. https://doi.org/10.18653/v1/2020.coling-main.166

Liu, S., Wright, A., Patterson, B., Wanderer, J., Turer, R., Nelson, S., … & Wright, A. (2023). Using ai-generated suggestions from chatgpt to optimize clinical decision support. Journal of the American Medical Informatics Association, 30(7), 1237-1245. https://doi.org/10.1093/jamia/ocad072

Luan, H., Géczy, P., Lai, H., Gobert, J., Yang, S., Ogata, H., … & Tsai, C. (2020). Challenges and future directions of big data and artificial intelligence in education. Frontiers in Psychology, 11. https://doi.org/10.3389/fpsyg.2020.580820

Lund, B. D., & Wang, T. (2023). Chatting about ChatGPT: How may AI and GPT impact academia and libraries? Library Hi Tech News, 40(3), 26–29. https://doi.org/10.1108/lhtn-01-2023-0009

Mahligawati, F. (2023). Artificial intelligence in physics education: a comprehensive literature review. Journal of Physics Conference Series, 2596(1), 012080. https://doi.org/10.1088/1742-6596/2596/1/012080

Mahligawati, F., Allanas, E., Butarbutar, M. H., & Nordin, N. A. N. (2023). Artificial intelligence in Physics Education: A comprehensive literature review. Journal of Physics: Conference Series, 2596(1), 012080. https://doi.org/10.1088/1742-6596/2596/1/012080

Myszczynska, M. A., Ojamies, P. N., Lacoste, A. M. B., Neil, D., Saffari, A., Mead, R., … & Ferraiuolo, L. (2020). Applications of machine learning to diagnosis and treatment of neurodegenerative diseases. Nature Reviews Neurology, 16(8), 440-456. https://doi.org/10.1038/s41582-020-0377-8

Nguyen, T. (2023). Exploring the efficacy of chatgpt in language teaching. Asiacall Online Journal, 14(2), 156-167. https://doi.org/10.54855/acoj.2314210

Parapadakis, D. (2020). Can artificial intelligence help predict a learner’s needs? lessons from predicting student satisfaction. London Review of Education, 18(2). https://doi.org/10.14324/lre.18.2.03

Pellegrino, J. W., & Quellmalz, E. S. (2010). Perspectives on the integration of technology and assessment. Journal of Research on Technology in Education, 43(2), 119-134.

Popenici, S. and Kerr, S. (2017). Exploring the impact of artificial intelligence on teaching and learning in higher education. Research and Practice in Technology Enhanced Learning, 12(1). https://doi.org/10.1186/s41039-017-0062-8

Reiß, M. (2021). The use of ai in education: practicalities and ethical considerations. London Review of Education, 19(1). https://doi.org/10.14324/lre.19.1.05

Richardson, M. & Clesham, R. (2021). Rise of the machines? the evolving role of ai technologies in high-stakes assessment. London Review of Education, 19(1). https://doi.org/10.14324/lre.19.1.09

Sandhu, S., Lin, A., Brajer, N., Sperling, J., Ratliff, W., Bedoya, A., … & Sendak, M. (2020). Integrating a machine learning system into clinical workflows: qualitative study. Journal of Medical Internet Research, 22(11), e22421. https://doi.org/10.2196/2242

Sapci, A. and Sapci, H. (2020). Artificial intelligence education and tools for medical and health informatics students: systematic review. Jmir Medical Education, 6(1), e19285. https://doi.org/10.2196/19285

Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86(2), 420-428.

Tapalova, O. and Zhiyenbayeva, N. (2022). Artificial intelligence in education: aied for personalised learning pathways. The Electronic Journal of E-Learning, 20(5), 639-653. https://doi.org/10.34190/ejel.20.5.2597

Teebagy, S., Colwell, L., Wood, E., Yaghy, A., & Faustina, M. (2023). Improved performance of ChatGPT-4 on the OKAP examination: A comparative study with ChatGPT-3.5. Journal of Academic Ophthalmology, 15(02), e184–e187. https://doi.org/10.1055/s-0043-1774399

Thomas, J. W. (2000). A review of research on project-based learning. San Rafael, CA: Autodesk Foundation

Tubino, L. and Adachi, C. (2022). Developing feedback literacy capabilities through an ai automated feedback tool. Ascilite Publications, e22039. https://doi.org/10.14742/apubs.2022.39

Wang, B., Zhang, Y., Wu, C., & Wang, F. (2021). Multimodal mri analysis of cervical cancer on the basis of artificial intelligence algorithm. Contrast Media & Amp; Molecular Imaging, 2021, 1-11. https://doi.org/10.1155/2021/1673490

Wiljer, D., Salhia, M., Dolatabadi, E., Dhalla, A., Gillan, C., Al-Mouaswas, D., … & Tavares, W. (2021). Accelerating the appropriate adoption of artificial intelligence in health care: protocol for a multistepped approach. JMIR Research Protocols, 10(10), e30940. https://doi.org/10.2196/30940

Yang, D. and Wang, Y. (2020). Hybrid physical education teaching and curriculum design based on a voice interactive artificial intelligence educational robot. Sustainability, 12(19), 8000. https://doi.org/10.3390/su12198000

Yin, W. (2021). Modeling method and application of college comprehensive teaching mode based on artificial intelligence. Converter, 566-573. https://doi.org/10.17762/converter.231

Yu, L. and Yu, Z. (2023). Qualitative and quantitative analyses of artificial intelligence ethics in education using vosviewer and citnetexplorer. Frontiers in Psychology, 14. https://doi.org/10.3389/fpsyg.2023.1061778

Zawacki‐Richter, O., Marín, V. I., Bond, M., & Gouverneur, F. (2019). Systematic review of research on artificial intelligence applications in higher education – where are the educators?. International Journal of Educational Technology in Higher Education, 16(1). https://doi.org/10.1186/s41239-019-0171-0

Zhai, X., & Nehm, R. H. (2023). AI and formative assessment: The train has left the station. Journal of Research in Science Teaching, 60(6), 1390–1398. https://doi.org/10.1002/tea.21885

Zhao, T. and Song, T. (2022). Establishing a fusion model of attention mechanism and generative adversarial network to estimate students' attitudes in english classes. Tehnicki Vjesnik - Technical Gazette, 29(5). https://doi.org/10.17559/tv-20210922053009

Zhou, J., & Shen, M. (2018). When human intelligence meets artificial intelligence. PsyCh Journal, 7(3), 156–157. https://doi.org/10.1002/pchj.216

Downloads

Published

2024-07-01