作者
José María Ortiz-Lozano,Pilar Aparicio-Chueca,Xavier María Triadó i Ivern,José Luis Arroyo Barrigüete
摘要
ABSTRACTStudent dropout is a major concern in studies investigating retention strategies in higher education. This study identifies which variables are important to predict student dropout, using academic data from 3583 first-year students on the Business Administration (BA) degree at the University of Barcelona (Spain). The results indicate that two variables, the percentage of subjects failed and not attended in the first semester, demonstrate significant predictive power. This has been corroborated with an additional sample of 10,784 students from three-degree programs (Law, BA, and Economics) at the Complutense University of Madrid (Spain), to assess the robustness of the results. Three different algorithms have also been utilized: neural networks, random forest, and logit. In the specific case of neural networks, the NeuralSens methodology has been employed, which is based on the use of sensitivities, allowing for its interpretation. The outcomes are highly consistent in all cases: both a simple model (logit) and more sophisticated ones (neural networks and random forest) exhibit high accuracy (correctly predicted values) and sensitivity (correctly predicted dropouts). In test set average values of 77% and 69% have been respectively achieved. In this regard, a noteworthy point is that only academic data from the university itself was used to develop the models. This ensures that there's no dependence on other personal or organizational variables, which can often be difficult to access.KEYWORDS: Predictionuniversity dropouteducational data miningacademic performanceneural networks Disclosure statementNo potential conflict of interest was reported by the author(s).Notes1 Other studies, such as that of Lizarte Simón and Gijón Puerta (Citation2022), in this case using a sample of students from Early Childhood, Primary, and Social Education and Pedagogy degree programs, achieve an accuracy of 91%, using predictors derived from a survey that evaluates various academic dimensions. This means, once again, the model requires access to a series of variables that are challenging to obtain.Additional informationFundingThis work was supported by Ministerio de Ciencia e Innovación [grant number: PID202020-116293RB-I00]. The authors would like to thank Universidad Complutense de Madrid (UCM) for the data, which have been obtained from the Integrated Institutional Data System (SIDI).