Evaluating the Robustness of Learning Analytics Results Against Fake Learners

Download Evaluating the Robustness of Learning Analytics Results Against Fake Learners at MIT DSpace.

Massive Open Online Courses (MOOCs) collect large amounts of rich data. A primary objective of Learning Analytics (LA) research is studying these data in order to improve the pedagogy of interactive learning environments. Most studies make the underlying assumption that the data represent truthful and honest learning activity. However, previous studies showed that MOOCs can have large cohorts of users that break this assumption and achieve high performance through behaviors such as Cheating Using Multiple Accounts or unauthorized collaboration, and we therefore denote them fake learners. Because of their aberrant behavior, fake learners can bias the results of Learning Analytics (LA) models. The goal of this study is to evaluate the robustness of LA results when the data contain a considerable number of fake learners. Our methodology follows the rationale of ‘replication research’. We challenge the results reported in a well-known, and one of the first LA/PedagogicEfficacy MOOC papers, by replicating its results with and without the fake learners (identified using machine learning algorithms). The results show that fake learners exhibit very different behavior compared to true learners. However, even though they are a significant portion of the student population (∼15%), their effect on the results is not dramatic (does not change trends). We conclude that the LA study that we challenged was robust against fake learners. While these results carry an optimistic message on the trustworthiness of LA research, they rely on data from one MOOC. We believe that this issue should receive more attention within the LA research community, and can explain some ‘surprising’ research results in MOOCs.

About José Ruipérez-Valiente

José A. Ruipérez-Valiente completed his B.Eng. and M.Eng. in Telecommunications at Universidad Católica de San Antonio de Murcia (UCAM) and Universidad Carlos III of Madrid (UC3M) respectively, graduating in both cases with the best academic transcript of the class. Afterwards, he completed his M.Sc. and P.hD. in Telematics at UC3M while conducting research at Institute IMDEA Networks in the area of learning analytics and educational data mining. During this time, he completed two research stays of three months each, the first one at MIT and the second one at the University of Edinburgh. He has received several academic and research awards and has published more than 25 scientific publications in important journals and conferences of his area of research. He has also held industry appointments at Vocento, Accenture and ExoClick, combining experience in academia, research institutions and business companies. Currently he is a postdoctoral associate at the CMS/W department at MIT where he is part of the Teaching Systems Lab and also collaborates with the Education Arcade in applying data science to large scale free online courses and to game-based environments to enhance human knowledge on how we learn. He is passionate about how learning occurs, solving data-based problems, teaching and sharing knowledge, yoga, nature and photography.


Share this Post