Evaluating the evaluators: a comparative study of AI and Teacher Assessments in Higher Education
DOI:
https://doi.org/10.1344/der.2024.45.124-140Keywords:
Artificial intelligence tool-based assessment systems, teacher evaluation, assessments in higher educationAbstract
This study aims to examine the potential differences between teacher evaluations and artificial intelligence (AI) tool-based assessment systems in university examinations. The research has evaluated a wide spectrum of exams including numerical and verbal course exams, exams with different assessment styles (project, test exam, traditional exam), and both theoretical and practical course exams. These exams were selected using a criterion sampling method and were analyzed using Bland-Altman Analysis and Intraclass Correlation Coefficient (ICC) analyses to assess how AI and teacher evaluations performed across a broad range. The research findings indicate that while there is a high level of proficiency between the total exam scores assessed by artificial intelligence and teacher evaluations; medium consistency was found in the evaluation of visually-based exams, low consistency in video exams, high consistency in test exams, and low consistency in traditional exams. This research is crucial as it helps to identify specific areas where artificial intelligence can either complement or needs improvement in educational assessment, guiding the development of more accurate and fair evaluation tools.
References
Amin, A. (2020). A face recognition system based on deep learning (frdls) to support the entry and supervision procedures on electronic exams. International Journal of Intelligent Computing and Information Sciences, 20(1), 40-50. https://doi.org/10.21608/ijicis.2020.23149.1015
Babbie, E. R. (2016). The Practice of Social research. Nelson Education.
Bai, X. and Stede, M. (2022). A survey of current machine learning approaches to student free-text evaluation for intelligent tutoring. International Journal of Artificial Intelligence in Education, 33(4), 992-1030. https://doi.org/10.1007/s40593-022-00323-0
Becher, T., & Trowler, P. R. (2001). Academic tribes and territories: Intellectual enquiry and the culture of disciplines. Open University Press.
Bland, J. M., & Altman, D. G. (1986). Statistical methods for assessing agreement between two methods of clinical measurement. The Lancet, 327(8476), 307-310.
Brookhart, S. M. (2008). How to give effective feedback to your students. Association for Supervision and Curriculum Development.
Cañada, J., Sanguino, T., Merelo, J., & Santos, V. (2014). Open classroom: enhancing student achievement on artificial intelligence through an international online competition. Journal of Computer Assisted Learning, 31(1), 14-31. https://doi.org/10.1111/jcal.12075
Celik, I., Dindar, M., Muukkonen, H., & Järvelä, S. (2022). The promises and challenges of artificial intelligence for teachers: A systematic review of research. TechTrends, 66(4), 616–630. https://doi.org/10.1007/s11528-022-00715-y
Chen, J., Lai, P. P. Y., Chan, A., Man, V., & Chan, C. (2022). Ai-assisted enhancement of student presentation skills: challenges and opportunities. Sustainability, 15(1), 196. https://doi.org/10.3390/su15010196
Chen, L., Chen, P., & Lin, Z. (2020). Artificial intelligence in education: a review. Ieee Access, 8, 75264-75278. https://doi.org/10.1109/access.2020.2988510
Chen, L., Chen, P., & Lin, Z. (2020). Artificial intelligence in education: a review. IEEE Access, 8, 75264-75278. https://doi.org/10.1109/access.2020.2988510
Choi, Y. and McClenen, C. (2020). Development of adaptive formative assessment system using computerized adaptive testing and dynamic bayesian networks. Applied Sciences, 10(22), 8196. https://doi.org/10.3390/app10228196
Collins, G., Dhiman, P., Navarro, C., Ma, J., Hooft, L., Reitsma, J., … & Moons, K. (2021). Protocol for development of a reporting guideline (tripod-ai) and risk of bias tool (probast-ai) for diagnostic and prognostic prediction model studies based on artificial intelligence. BMJ Open, 11(7), e048008. https://doi.org/10.1136/bmjopen-2020-048008
Creswell, J. W., & Creswell, J. D. (2018). Research design: Qualitative, quantitative, and mixed methods approaches. Sage Publications.
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297-334.
Dermeval, D., Paiva, R., Bittencourt, I., Vassileva, J., & Borges, D. (2017). Authoring tools for designing intelligent tutoring systems: a systematic review of the literature. International Journal of Artificial Intelligence in Education, 28(3), 336-384. https://doi.org/10.1007/s40593-017-0157-9
Dindar, M., Muukkonen, H., & Järvelä, S. (2022). The promises and challenges of artificial intelligence for teachers: a systematic review of research. Techtrends, 66(4), 616-630. https://doi.org/10.1007/s11528-022-00715-y
Dwivedi, Y., Hughes, L., Ismagilova, E., Aarts, G., Coombs, C., Crick, T., … & Williams, M. (2021). Artificial intelligence (ai): multidisciplinary perspectives on emerging challenges, opportunities, and agenda for research, practice and policy. International Journal of Information Management, 57, 101994. https://doi.org/10.1016/j.ijinfomgt.2019.08.002
Dyment, J. E., & O'Connell, T. S. (2011). Assessing the quality of reflection in student journals: A review of the research. Teaching in Higher Education, 16(1), 81-97.
Ed-Driouch, C., Gourraud, P., Dumas, C., & Mars, F. (2022). The integration of human intelligence into artificial intelligence to provide medical practice-based predictions. HHAI2022: Augmenting Human Intellect. https://doi.org/10.3233/faia220221
Eisner, E. W. (2002). The arts and the creation of mind. Yale University Press.
Elder, H., Rieger, T., Canfield, C., Shank, D. B., & Hines, C. (2022). Knowing when to pass: the effect of ai reliability in risky decision contexts. Human Factors: The Journal of the Human Factors and Ergonomics Society, 66(2), 348-362. https://doi.org/10.1177/00187208221100691
Fiske, A., Henningsen, P., & Buyx, A. (2019). Your robot therapist will see you now: ethical implications of embodied artificial intelligence in psychiatry, psychology, and psychotherapy. Journal of Medical Internet Research, 21(5), e13216. https://doi.org/10.2196/13216
Freeman, S., Eddy, S. L., McDonough, M., Smith, M. K., Okoroafor, N., Jordt, H., & Wenderoth, M. P. (2014). Active learning increases student performance in science, engineering, and mathematics. Proceedings of the National Academy of Sciences, 111(23), 8410-8415.
Giavarina, D. (2015). Understanding Bland Altman analysis. Biochemia Medica, 25(2), 141-151.
González-Calatayud, V., Prendes-Espinosa, P., & Roig-Vila, R. (2021). Artificial intelligence for student assessment: A systematic review. Applied Sciences, 11(12), 5467. https://doi.org/10.3390/app11125467
Gunning, D., Stefik, M., Choi, J., Miller, T., Stumpf, S., & Yang, G. (2019). Xai—explainable artificial intelligence. Science Robotics, 4(37). https://doi.org/10.1126/scirobotics.aay712,
Güven, H. and Güven, E. (2023). Use of artificial intelligence applications in e-commerce. International Journal of Management and Administration, 7(13), 69-94. https://doi.org/10.29064/ijma.1194949
Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied Measurement in Education, 15(3), 309-334.
Harry, A. (2023). Role of ai in education. Interdiciplinary Journal and Hummanity (Injurity), 2(3), 260-268. https://doi.org/10.58631/injurity.v2i3.52
Haseski, H. (2019). What do turkish pre-service teachers think about artificial intelligence?. International Journal of Computer Science Education in Schools, 3(2), 3-23. https://doi.org/10.21585/ijcses.v3i2.55
Hinojo-Lucena, F., Díaz, I., Reche, M., & Rodríguez, J. (2019). Artificial intelligence in higher education: a bibliometric study on its impact in the scientific literature. Education Sciences, 9(1), 51. https://doi.org/10.3390/educsci9010051
Hua, Y. (2022). Design of online music education system based on artificial intelligence and multiuser detection algorithm. Computational Intelligence and Neuroscience, 2022, 1-11. https://doi.org/10.1155/2022/9083436
Ishaaq, N. & Sohail, S. S. (2023). Re: investigating the impact of innovative ai chatbot on post‐pandemic medical education and clinical assistance: a comprehensive analysis. ANZ Journal of Surgery, 94(3), 494-494. https://doi.org/10.1111/ans.18721
Jingshan, H. (2023). Analysis of the Application of Artificial Intelligence in Education and Teaching. Advances in Educational Technology and Psychology, doi: 10.23977/aetp.2023.070210
Jobin, A. & Ienca, M. (2019). The global landscape of ai ethics guidelines. Nature Machine Intelligence, 1(9), 389-399. https://doi.org/10.1038/s42256-019-0088-
Kawaji, T., Hojo, S., Kushiyama, A., Nakatsuma, K., Kaneda, K., Kato, M., … & Sato, M. (2019). Limitations of lesion quality estimated by ablation index: an in vitro study. Journal of Cardiovascular Electrophysiology, 30(6), 926-933. https://doi.org/10.1111/jce.13928
Kazimov, T., Bayramova, T., & Malikova, N. (2021). Research of intelligent methods of software testing. System Research and Information Technologies, (4), 42-52. https://doi.org/10.20535/srit.2308-8893.2021.4.03
Keskinbora, K. & Güven, F. (2020). Artificial intelligence and ophthalmology. Turkish Journal of Ophthalmology, 50(1), 37-43. https://doi.org/10.4274/tjo.galenos.2020.78989
Koo, T. K., & Li, M. Y. (2016). A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of Chiropractic Medicine, 15(2), 155-163.
Köse, U. and Arslan, A. (2017). Optimization of self‐learning in computer engineering courses: an intelligent software system supported by artificial neural network and vortex optimization algorithm. Computer Applications in Engineering Education, 25(1), 142-156. https://doi.org/10.1002/cae.21787
Laupichler, M. C., Aster, A., Meyerheim, M., Raupach, T., & Mergen, M. (2024). Medical students’ ai literacy and attitudes towards ai: a cross-sectional two-center study using pre-validated assessment instruments. BMC Medical Education, 24(1). https://doi.org/10.1186/s12909-024-05400-
Liu, H., Liu, Z., Wu, Z., & Tang, J. (2020). Personalized multimodal feedback generation in education.. https://doi.org/10.18653/v1/2020.coling-main.166
Liu, S., Wright, A., Patterson, B., Wanderer, J., Turer, R., Nelson, S., … & Wright, A. (2023). Using ai-generated suggestions from chatgpt to optimize clinical decision support. Journal of the American Medical Informatics Association, 30(7), 1237-1245. https://doi.org/10.1093/jamia/ocad072
Luan, H., Géczy, P., Lai, H., Gobert, J., Yang, S., Ogata, H., … & Tsai, C. (2020). Challenges and future directions of big data and artificial intelligence in education. Frontiers in Psychology, 11. https://doi.org/10.3389/fpsyg.2020.580820
Lund, B. D., & Wang, T. (2023). Chatting about ChatGPT: How may AI and GPT impact academia and libraries? Library Hi Tech News, 40(3), 26–29. https://doi.org/10.1108/lhtn-01-2023-0009
Mahligawati, F. (2023). Artificial intelligence in physics education: a comprehensive literature review. Journal of Physics Conference Series, 2596(1), 012080. https://doi.org/10.1088/1742-6596/2596/1/012080
Mahligawati, F., Allanas, E., Butarbutar, M. H., & Nordin, N. A. N. (2023). Artificial intelligence in Physics Education: A comprehensive literature review. Journal of Physics: Conference Series, 2596(1), 012080. https://doi.org/10.1088/1742-6596/2596/1/012080
Myszczynska, M. A., Ojamies, P. N., Lacoste, A. M. B., Neil, D., Saffari, A., Mead, R., … & Ferraiuolo, L. (2020). Applications of machine learning to diagnosis and treatment of neurodegenerative diseases. Nature Reviews Neurology, 16(8), 440-456. https://doi.org/10.1038/s41582-020-0377-8
Nguyen, T. (2023). Exploring the efficacy of chatgpt in language teaching. Asiacall Online Journal, 14(2), 156-167. https://doi.org/10.54855/acoj.2314210
Parapadakis, D. (2020). Can artificial intelligence help predict a learner’s needs? lessons from predicting student satisfaction. London Review of Education, 18(2). https://doi.org/10.14324/lre.18.2.03
Pellegrino, J. W., & Quellmalz, E. S. (2010). Perspectives on the integration of technology and assessment. Journal of Research on Technology in Education, 43(2), 119-134.
Popenici, S. and Kerr, S. (2017). Exploring the impact of artificial intelligence on teaching and learning in higher education. Research and Practice in Technology Enhanced Learning, 12(1). https://doi.org/10.1186/s41039-017-0062-8
Reiß, M. (2021). The use of ai in education: practicalities and ethical considerations. London Review of Education, 19(1). https://doi.org/10.14324/lre.19.1.05
Richardson, M. & Clesham, R. (2021). Rise of the machines? the evolving role of ai technologies in high-stakes assessment. London Review of Education, 19(1). https://doi.org/10.14324/lre.19.1.09
Sandhu, S., Lin, A., Brajer, N., Sperling, J., Ratliff, W., Bedoya, A., … & Sendak, M. (2020). Integrating a machine learning system into clinical workflows: qualitative study. Journal of Medical Internet Research, 22(11), e22421. https://doi.org/10.2196/2242
Sapci, A. and Sapci, H. (2020). Artificial intelligence education and tools for medical and health informatics students: systematic review. Jmir Medical Education, 6(1), e19285. https://doi.org/10.2196/19285
Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86(2), 420-428.
Tapalova, O. and Zhiyenbayeva, N. (2022). Artificial intelligence in education: aied for personalised learning pathways. The Electronic Journal of E-Learning, 20(5), 639-653. https://doi.org/10.34190/ejel.20.5.2597
Teebagy, S., Colwell, L., Wood, E., Yaghy, A., & Faustina, M. (2023). Improved performance of ChatGPT-4 on the OKAP examination: A comparative study with ChatGPT-3.5. Journal of Academic Ophthalmology, 15(02), e184–e187. https://doi.org/10.1055/s-0043-1774399
Thomas, J. W. (2000). A review of research on project-based learning. San Rafael, CA: Autodesk Foundation
Tubino, L. and Adachi, C. (2022). Developing feedback literacy capabilities through an ai automated feedback tool. Ascilite Publications, e22039. https://doi.org/10.14742/apubs.2022.39
Wang, B., Zhang, Y., Wu, C., & Wang, F. (2021). Multimodal mri analysis of cervical cancer on the basis of artificial intelligence algorithm. Contrast Media & Amp; Molecular Imaging, 2021, 1-11. https://doi.org/10.1155/2021/1673490
Wiljer, D., Salhia, M., Dolatabadi, E., Dhalla, A., Gillan, C., Al-Mouaswas, D., … & Tavares, W. (2021). Accelerating the appropriate adoption of artificial intelligence in health care: protocol for a multistepped approach. JMIR Research Protocols, 10(10), e30940. https://doi.org/10.2196/30940
Yang, D. and Wang, Y. (2020). Hybrid physical education teaching and curriculum design based on a voice interactive artificial intelligence educational robot. Sustainability, 12(19), 8000. https://doi.org/10.3390/su12198000
Yin, W. (2021). Modeling method and application of college comprehensive teaching mode based on artificial intelligence. Converter, 566-573. https://doi.org/10.17762/converter.231
Yu, L. and Yu, Z. (2023). Qualitative and quantitative analyses of artificial intelligence ethics in education using vosviewer and citnetexplorer. Frontiers in Psychology, 14. https://doi.org/10.3389/fpsyg.2023.1061778
Zawacki‐Richter, O., Marín, V. I., Bond, M., & Gouverneur, F. (2019). Systematic review of research on artificial intelligence applications in higher education – where are the educators?. International Journal of Educational Technology in Higher Education, 16(1). https://doi.org/10.1186/s41239-019-0171-0
Zhai, X., & Nehm, R. H. (2023). AI and formative assessment: The train has left the station. Journal of Research in Science Teaching, 60(6), 1390–1398. https://doi.org/10.1002/tea.21885
Zhao, T. and Song, T. (2022). Establishing a fusion model of attention mechanism and generative adversarial network to estimate students' attitudes in english classes. Tehnicki Vjesnik - Technical Gazette, 29(5). https://doi.org/10.17559/tv-20210922053009
Zhou, J., & Shen, M. (2018). When human intelligence meets artificial intelligence. PsyCh Journal, 7(3), 156–157. https://doi.org/10.1002/pchj.216
Downloads
Published
Issue
Section
License
Copyright (c) 2024 tugra karademir coskun, Ayfer Alper
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
The authors who publish in this journal agree to the following terms:
- Authors retain copyright and grant the journal the right of first publication.
- The texts published in Digital Education Review, DER, are under a license Attribution-Noncommercial-No Derivative Works 4,0 Spain, of Creative Commons. All the conditions of use in: Creative Commons,
- In order to mention the works, you must give credit to the authors and to this Journal.
- Digital Education Review, DER, does not accept any responsibility for the points of view and statements made by the authors in their work.