|
Vol.15, No.2, May 2026. ISSN: 2217-8309 eISSN: 2217-8333
TEM Journal
TECHNOLOGY, EDUCATION, MANAGEMENT, INFORMATICS Association for Information Communication Technology Education and Science |
Evaluating the Potential of Open LLMs in Multilingual and On-Device Environments
Youngho Lee
© 2026 Youngho Lee, published by UIKTEN. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. (CC BY-NC-ND 4.0)
Citation Information: TEM Journal. Volume 15, Issue 2, Pages 1493-1505, ISSN 2217-8309, DOI: 10.18421/TEM152-45, May 2026.
Received: 04 May 2025.
Abstract:
Existing literature highlights Socratic questioning as a key method for fostering critical thinking, yet its scalability is limited by teacher workload. While Large Language Models (LLMs) offer automation potential, recent studies point to performance disparities between Open and Closed models, particularly in multilingual and resource-constrained environments. This study addresses two primary research questions: whether Open LLMs can perform comparably to Closed LLMs in generating Socratic questions for Korean students, and if on-device models are practically feasible. It was hypothesized that optimized Open LLMs would achieve semantic parity with proprietary Closed models and that quantized on-device versions would maintain sufficient quality for offline educational use. To test this, a comparative analysis was conducted using a dataset of 2,400 argumentative essays written by South Korean elementary (grades 4–6), middle, and high school students. Four models—Phi-4 (14B), Phi-4 GGUF, LLaMA-3.1 (8B), and GPT-o3-mini were evaluated using BLEU, ROUGE-L, Cosine Similarity, and BERTScore. The results demonstrate that Phi-4 (14B) achieved performance comparable to the Closed model benchmark, and the on-device Phi-4 GGUF proved viable for privacy-conscious environments. Notably, question quality was higher for elementary-level essays and consistent across varying student performance levels, as long as logical structure was maintained in the input. These findings support the practical application of open and lightweight LLMs to deliver scalable and equitable AI-based feedback.
Keywords – Socratic questioning, large language models (LLMs), AI in education, on-device AI, multilingual learning environments. |
|
----------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------------- |