The rapid advancement of large language models (LLMs) for code generation has largely centered on English programming queries. This paper targets a low-resource language scenario, Turkish, in Flutter mobile app development. Two representative LLMs (a 4B-parameter multilingual model and a 3B code-specialized model) on a new Turkish question-and-answer dataset for Flutter/Dart are fine-tuned in this study. Fine-tuning with parameter-efficient techniques yields dramatic improvements in code generation quality: Bilingual Evaluation Understudy (BLEU), Recall-Oriented Understudy for Gisting Evaluation (ROUGE-L), Metric for Evaluation of Translation with Explicit Ordering (METEOR), Bidirectional Encoder Representations from Transformers Score (BERTScore), and CodeBLEU scores show significant increases. The rate of correct solutions increased from ~30–70% (for base models) to 80–90% after fine-tuning. The performance trade-offs between models are analyzed, revealing that the multilingual model slightly outperforms the code-focused model in accuracy after fine-tuning. However, the code-focused model demonstrates faster inference speeds. These results demonstrate that even with very limited non-English training data, customizing LLMs can bridge the gap in code generation, enabling high-quality assistance for Turkish developers comparable to that for English. The dataset was released on GitHub to facilitate further research in multilingual code generation.
Primary Language | English |
---|---|
Subjects | Computer Software, Software Engineering (Other) |
Journal Section | Research Article |
Authors | |
Early Pub Date | October 13, 2025 |
Publication Date | October 15, 2025 |
Submission Date | June 18, 2025 |
Acceptance Date | July 14, 2025 |
Published in Issue | Year 2025 Volume: 8 Issue: 4 |
The papers in this journal are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License