TEM JOURNAL - Technology, Education, Management, Informatics

Vol.15, No.2, May 2026. ISSN: 2217-8309

eISSN: 2217-8333

TEM Journal

TECHNOLOGY, EDUCATION, MANAGEMENT, INFORMATICS

Association for Information Communication Technology Education and Science

Home|Instructions for the Authors|Submit paper|Editorial Board |Archives|Contact

Evaluating the Potential of Open LLMs in Multilingual and On-Device Environments

Youngho Lee

Citation Information: TEM Journal. Volume 15, Issue 2, Pages 1493-1505, ISSN 2217-8309, DOI: 10.18421/TEM152-45, May 2026.

Received: 04 May 2025.
Revised: 27 November 2025.
Accepted: 13 December 2025.
Published: 27 May 2026.

Abstract:

Existing literature highlights Socratic questioning as a key method for fostering critical thinking, yet its scalability is limited by teacher workload. While Large Language Models (LLMs) offer automation potential, recent studies point to performance disparities between Open and Closed models, particularly in multilingual and resource-constrained environments. This study addresses two primary research questions: whether Open LLMs can perform comparably to Closed LLMs in generating Socratic questions for Korean students, and if on-device models are practically feasible. It was hypothesized that optimized Open LLMs would achieve semantic parity with proprietary Closed models and that quantized on-device versions would maintain sufficient quality for offline educational use. To test this, a comparative analysis was conducted using a dataset of 2,400 argumentative essays written by South Korean elementary (grades 4–6), middle, and high school students. Four models—Phi-4 (14B), Phi-4 GGUF, LLaMA-3.1 (8B), and GPT-o3-mini were evaluated using BLEU, ROUGE-L, Cosine Similarity, and BERTScore. The results demonstrate that Phi-4 (14B) achieved performance comparable to the Closed model benchmark, and the on-device Phi-4 GGUF proved viable for privacy-conscious environments. Notably, question quality was higher for elementary-level essays and consistent across varying student performance levels, as long as logical structure was maintained in the input. These findings support the practical application of open and lightweight LLMs to deliver scalable and equitable AI-based feedback.

Keywords – Socratic questioning, large language models (LLMs), AI in education, on-device AI, multilingual learning environments.

-----------------------------------------------------------------------------------------------------------

Full text PDF >

-----------------------------------------------------------------------------------------------------------