Vol.13, No.4, November 2024. ISSN: 2217-8309 eISSN: 2217-8333
TEM Journal
TECHNOLOGY, EDUCATION, MANAGEMENT, INFORMATICS Association for Information Communication Technology Education and Science |
Proposed Approach for Overcoming the Impact of Unbalanced Distribution in Predicting Students' Performance
Gabrijela Dimić, Ljiljana Pecić
© 2024 Gabrijela Dimić, published by UIKTEN. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. (CC BY-NC-ND 4.0)
Citation Information: TEM Journal. Volume 13, Issue 4, Pages 2839-2849, ISSN 2217-8309, DOI: 10.18421/TEM134-20, November 2024.
Received: 26 May 2024. Revised: 23 September 2024.
Abstract:
The paper presents a method for mitigating the impact of an unbalanced distribution of multidimensional class features on grade prediction accuracy. For the purposes of the case study, an educational data set named APOD was created by integrating data from heterogeneous sources. The input features and the multidimensional class feature were defined. The effectiveness of adopting the Synthetic Minority Over-Sampling Technique (SMOTE) to handle data imbalance issues was explored using various classification methods. To determine which algorithm performed best in terms of minority class distribution, three experiments were carried out. The SMOTE approach with automatic minority class detection and a 100% sampling factor demonstrated a considerable improvement in model performance for four out of five classifiers that were tested. The primary objective of the study described in this paper is to address the problem of predicting students' final grades in situations where a small dataset causes data imbalance. Small datasets provide insufficient representation of instances within specific classes, resulting in unreliable models with poor performance in predicting student success. The proposed approach for implementing SMOTE is based on an algorithm for identifying minority classes, with a predetermined minimum number of samples per class. This approach enables the development of precise models for predicting students' final test results, even with small educational datasets. The contribution of the proposed research lies in achieving greater accuracy in predicting students' final grades, regardless of dataset size and the presence of minority classes.
Keywords – Classification, SMOTE, unbalananced distribution, machine learning, educational data mining. |
----------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------------- |