Estudio de la fiabilidad de test multirrespuesta con el método de Monte Carlo

Calaf Chica, José; García Tárrago, María José

doi:10.4438/1988-592X-RE-2021-392-479

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10259/8301

Título

Estudio de la fiabilidad de test multirrespuesta con el método de Monte Carlo

Otro título

Reliability analysis of multiple-choice tests with the Monte Carlo method

Autor

Calaf Chica, José

García Tárrago, María José

Publicado en

Revista de Educación. 2021, n. 392, p. 63-95

Editorial

Subdirección General de Documentación y Publicaciones

Fecha de publicación

2021

ISSN

0034-8082

DOI

10.4438/1988-592X-RE-2021-392-479

Résumé

Durante gran parte del siglo XX se ha escrito mucho sobre la fiabilidad de los test multirrespuesta como método para la evaluación de contenidos. En concreto son muchos los estudios teóricos y empíricos que buscan enfrentar los distintos sistemas de puntuación existentes. En esta investigación se ha diseñado un algoritmo que genera estudiantes virtuales con los siguientes atributos: conocimiento real, nivel de cautela y conocimiento erróneo. El primer parámetro establece la probabilidad que tiene el alumno de conocer la veracidad o falsedad de cada opción de respuesta del test. El nivel de cautela refleja la probabilidad de responder a una cuestión desconocida. Finalmente, el conocimiento erróneo es aquel conocimiento falsamente asimilado como cierto. El algoritmo también tiene en cuenta parámetros de configuración del test como el número de preguntas, el número de opciones de respuesta por pregunta y el sistema de puntuación establecido. El algoritmo lanza test a los individuos virtuales analizando la desviación generada entre el conocimiento real y el conocimiento estimado (la puntuación alcanzada en el test). En este estudio se confrontaron los sistemas de puntuación más comúnmente utilizados (marcado positivo, marcado negativo, test de elección libre y método de la respuesta doble) para comprobar la fiabilidad de cada uno de ellos. Para la validación del algoritmo, se comparó con un modelo analítico probabilístico. De los resultados obtenidos, se observó que la existencia o no de conocimiento erróneo generaba una importante alteración en la fiabilidad de los test más aceptados por la comunidad educativa (los test de marcado negativo). Ante la imposibilidad de comprobar la existencia de conocimiento erróneo en los individuos a través de un test, es decisión del evaluador castigar su presencia con el uso del marcado negativo, o buscar una estimación más real del conocimiento real a través del marcado positivo.

During the twentieth century many investigations have been published about the reliability of the multiple-choice tests for subject evaluation. Specifically, there are a lot of theoretical and empirical studies that compare the different scoring methods applied in tests. A novel algorithm was designed to generate hypothetical examinees with three specific characteristics: real knowledge, level of cautiousness and erroneous knowledge. The first one established the probability to know the veracity or falsity of each answer choice in a multiple-choice test. The cautiousness level showed the probability to answer an unknown question by guessing. Finally, the erroneous knowledge was false knowledge assimilated as true. The test setup needed by the algorithm included the test length, choices per question and the scoring system. The algorithm launched tests to these hypothetical examinees analysing the deviation between the real knowledge and the estimated knowledge (the test score). The most popular test scoring methods (positive marking, negative marking, free-choice tests and the dual response method) were analysed and compared to measure their reliability. In order to validate the algorithm, this was compared with an analytical probabilistic model. This investigation verified that the presence of the erroneous knowledge generates an important alteration in the reliability of the most accepted scoring methods in the educational community (the negative marking method). In view of the impossibility of ascertaining the existence of erroneous knowledge in the examinees using a test, the examiner could penalize its presence with the use of negative marking, or looking for a best fitted estimation of the real knowledge with the positive marking method.

Palabras clave

Test Multirrespuesta

Simulación Computacional

Puntuación

Evaluación

Método de Monte Carlo

Test Reliability

Computer Simulation

Scoring

Evaluation

Monte Carlo Methods

Materia