Vol.15, No.2, May 2026.                                                                                                                                                                          ISSN: 2217-8309

                                                                                                                                                                                                                        eISSN: 2217-8333

 

TEM Journal

 

TECHNOLOGY, EDUCATION, MANAGEMENT, INFORMATICS

Association for Information Communication Technology Education and Science

 

Evaluating the Potential of Open LLMs in Multilingual and On-Device Environments

 

Youngho Lee

 

© 2026 Youngho Lee, published by UIKTEN. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. (CC BY-NC-ND 4.0)

 

Citation Information: TEM Journal. Volume 15, Issue 2, Pages 1493-1505, ISSN 2217-8309, DOI: 10.18421/TEM152-45, May 2026.

 

Received: 04 May 2025.
Revised: 27 November 2025.
Accepted: 13 December 2025.
Published: 27 May 2026.

 

Abstract:

 

Existing literature highlights Socratic questioning as a key method for fostering critical thinking, yet its scalability is limited by teacher workload. While Large Language Models (LLMs) offer automation potential, recent studies point to performance disparities between Open and Closed models, particularly in multilingual and resource-constrained environments. This study addresses two primary research questions: whether Open LLMs can perform comparably to Closed LLMs in generating Socratic questions for Korean students, and if on-device models are practically feasible. It was hypothesized that optimized Open LLMs would achieve semantic parity with proprietary Closed models and that quantized on-device versions would maintain sufficient quality for offline educational use. To test this, a comparative analysis was conducted using a dataset of 2,400 argumentative essays written by South Korean elementary (grades 4–6), middle, and high school students. Four models—Phi-4 (14B), Phi-4 GGUF, LLaMA-3.1 (8B), and GPT-o3-mini were evaluated using BLEU, ROUGE-L, Cosine Similarity, and BERTScore. The results demonstrate that Phi-4 (14B) achieved performance comparable to the Closed model benchmark, and the on-device Phi-4 GGUF proved viable for privacy-conscious environments. Notably, question quality was higher for elementary-level essays and consistent across varying student performance levels, as long as logical structure was maintained in the input. These findings support the practical application of open and lightweight LLMs to deliver scalable and equitable AI-based feedback.

 

Keywords – Socratic questioning, large language models (LLMs), AI in education, on-device AI, multilingual learning environments.

 

-----------------------------------------------------------------------------------------------------------

Full text PDF >  

-----------------------------------------------------------------------------------------------------------

 


Copyright © 2026 UIKTEN
Copyright licence: All articles are licenced via Creative Commons CC BY-NC-ND 4.0 licence