Fine-tuning Large Language Models for Turkish Flutter Code Generation

Bugra Uluırmak; Rifat Kurban

doi:10.35377/saucis...1722643

Research Article

Year 2025, Volume: 8 Issue: 4, 637 - 650

Bugra Uluırmak , Rifat Kurban

https://doi.org/10.35377/saucis...1722643

Abstract

References

J. He, C. Zhou, X. Ma, T. Berg-Kirkpatrick, & G. Neubig, "Towards a unified view of parameter-efficient transfer learning", 2021. https://doi.org/10.48550/arxiv.2110.04366
N. Houlsby, A. Giurgiu, S. Jastrzȩbski, B. Morrone, Q. Laroussilhe, A. Gesmundoet al., "Parameter-efficient transfer learning for nlp",, 2019. https://doi.org/10.48550/arxiv.1902.00751
X. Liu, P. He, W. Chen, & J. Gao, "Multi-task deep neural networks for natural language understanding", 2019. https://doi.org/10.18653/v1/p19-1441
M. Anschütz, D. Lozano, & G. Groh, "This is not correct! negation-aware evaluation of language generation systems", 2023. https://doi.org/10.18653/v1/2023.inlg-main.12
A. Lodha, G. Belapurkar, S. Chalkapurkar, Y. Tao, R. Ghosh, S. Basuet al., "On surgical fine-tuning for language encoders", 2023. https://doi.org/10.18653/v1/2023.findings-emnlp.204
J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wanget al., "Lora: low-rank adaptation of large language models",, 2021. https://doi.org/10.48550/arxiv.2106.09685
Y. Hu, Y. Xie, T. Wang, M. Chen, & Z. Pan, "Structure-aware low-rank adaptation for parameter-efficient fine-tuning", Mathematics, vol. 11, no. 20, p. 4317, 2023. https://doi.org/10.3390/math11204317
N. Dhinagar, S. Ozarkar, K. Buwa, S. Thomopoulos, C. Owens‐Walton, E. Laltooet al., "Parameter efficient fine-tuning of transformer-based masked autoencoder enhances resource constrained neuroimage analysis",, 2025. https://doi.org/10.1101/2025.02.15.638442
H. Wu, "Large language models capsule: a research analysis of in-context learning (icl) and parameter-efficient fine-tuning (peft) methods", Applied and Computational Engineering, vol. 43, no. 1, p. 327-331, 2024. https://doi.org/10.54254/2755-2721/43/20230858
N. Sulaiman and F. Hamzah, "Optimizing llama 7b for medical question answering: a study on fine-tuning strategies and performance on the multimedqa dataset", 2024. https://doi.org/10.31219/osf.io/g5aes
J. Bogaert, E. Jean, C. Bodt, & F. Standaert, "Fine-tuning is not (always) overfitting artifacts", 2023. https://doi.org/10.14428/esann/2023.es2023-152
G. Wiedemann, S. Yimam, & C. Biemann, "Uhh-lt at semeval-2020 task 12: fine-tuning of pre-trained transformer networks for offensive language detection",, p. 1638-1644, 2020. https://doi.org/10.18653/v1/2020.semeval-1.213
A. Aghajanyan, S. Gupta, & L. Zettlemoyer, "Intrinsic dimensionality explains the effectiveness of language model fine-tuning", 2021. https://doi.org/10.18653/v1/2021.acl-long.568
L. Feng, Y. Yang, M. Tan, T. Zeng, Z. Li, H. Tanget al., "Adaptive multi-source domain collaborative fine-tuning for transfer learning",, 2023. https://doi.org/10.20944/preprints202311.0124.v1
F. Ullah, U. Azam, A. Faheem, F. Kamiran, & A. Karim, "Comparing prompt-based and standard fine-tuning for urdu text classification",, p. 6747-6754, 2023. https://doi.org/10.18653/v1/2023.findings-emnlp.449
M. Mosbach, M. Andriushchenko, & D. Klakow, "On the stability of fine-tuning bert: misconceptions, explanations, and strong baselines",, 2020. https://doi.org/10.48550/arxiv.2006.04884
X. Li and P. Liang, "Prefix-tuning: optimizing continuous prompts for generation", 2021. https://doi.org/10.18653/v1/2021.acl-long.353
X. Ma, C. Santos, & A. Arnold, "Contrastive fine-tuning improves robustness for neural rankers", 2021. https://doi.org/10.18653/v1/2021.findings-acl.51
L. Pan, C. Hang, A. Sil, & S. Potdar, "Improved text classification via contrastive adversarial training", 2021. https://doi.org/10.48550/arxiv.2107.10137
Chen M., Tworek J., Jun H., Kaplan J., Yuan Q. and Zarinelli E., “Evaluating Large Language Models Trained on Code”, arXiv preprint arXiv:2107.03374, (2021). https://doi.org/10.48550/arXiv.2107.03374
Xu X., Sharma P., Kinne J. F., O’Neill M., Mazaitis K. and Bhatia S., “A Systematic Evaluation of Large Language Models of Code”, Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 662-678, (2022). https://doi.org/10.48550/arXiv.2202.13169
Wang Z., Cuenca G., Zhou S., Chen T., Lin B. and Matsuo Y., “MCoNaLa: A Benchmark for Code Generation from Multiple Natural Languages”, Findings of the Association for Computational Linguistics: EACL 2023, 265-273, (2023). https://doi.org/10.48550/arXiv.2203.08388
Cassano F., Gouwar J., Nguyen D., Bartolo M., Serrano S. and Sabour A., “MultiPL-E: A Scalable and Extensible Approach to Benchmarking Neural Code Generation”, arXiv preprint arXiv:2208.08227, (2022). https://doi.org/10.48550/arXiv.2208.08227

Fine-tuning Large Language Models for Turkish Flutter Code Generation

Year 2025, Volume: 8 Issue: 4, 637 - 650

Bugra Uluırmak , Rifat Kurban

https://doi.org/10.35377/saucis...1722643

Abstract

The rapid advancement of large language models (LLMs) for code generation has largely centered on English programming queries. This paper targets a low-resource language scenario, Turkish, in Flutter mobile app development. Two representative LLMs (a 4B-parameter multilingual model and a 3B code-specialized model) on a new Turkish question-and-answer dataset for Flutter/Dart are fine-tuned in this study. Fine-tuning with parameter-efficient techniques yields dramatic improvements in code generation quality: Bilingual Evaluation Understudy (BLEU), Recall-Oriented Understudy for Gisting Evaluation (ROUGE-L), Metric for Evaluation of Translation with Explicit Ordering (METEOR), Bidirectional Encoder Representations from Transformers Score (BERTScore), and CodeBLEU scores show significant increases. The rate of correct solutions increased from ~30–70% (for base models) to 80–90% after fine-tuning. The performance trade-offs between models are analyzed, revealing that the multilingual model slightly outperforms the code-focused model in accuracy after fine-tuning. However, the code-focused model demonstrates faster inference speeds. These results demonstrate that even with very limited non-English training data, customizing LLMs can bridge the gap in code generation, enabling high-quality assistance for Turkish developers comparable to that for English. The dataset was released on GitHub to facilitate further research in multilingual code generation.

Keywords

Code Generation , Large Language Models , Fine-Tuning , Low-Resource Languages , Flutter

References

J. He, C. Zhou, X. Ma, T. Berg-Kirkpatrick, & G. Neubig, "Towards a unified view of parameter-efficient transfer learning", 2021. https://doi.org/10.48550/arxiv.2110.04366
N. Houlsby, A. Giurgiu, S. Jastrzȩbski, B. Morrone, Q. Laroussilhe, A. Gesmundoet al., "Parameter-efficient transfer learning for nlp",, 2019. https://doi.org/10.48550/arxiv.1902.00751
X. Liu, P. He, W. Chen, & J. Gao, "Multi-task deep neural networks for natural language understanding", 2019. https://doi.org/10.18653/v1/p19-1441
M. Anschütz, D. Lozano, & G. Groh, "This is not correct! negation-aware evaluation of language generation systems", 2023. https://doi.org/10.18653/v1/2023.inlg-main.12
A. Lodha, G. Belapurkar, S. Chalkapurkar, Y. Tao, R. Ghosh, S. Basuet al., "On surgical fine-tuning for language encoders", 2023. https://doi.org/10.18653/v1/2023.findings-emnlp.204
J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wanget al., "Lora: low-rank adaptation of large language models",, 2021. https://doi.org/10.48550/arxiv.2106.09685
Y. Hu, Y. Xie, T. Wang, M. Chen, & Z. Pan, "Structure-aware low-rank adaptation for parameter-efficient fine-tuning", Mathematics, vol. 11, no. 20, p. 4317, 2023. https://doi.org/10.3390/math11204317
N. Dhinagar, S. Ozarkar, K. Buwa, S. Thomopoulos, C. Owens‐Walton, E. Laltooet al., "Parameter efficient fine-tuning of transformer-based masked autoencoder enhances resource constrained neuroimage analysis",, 2025. https://doi.org/10.1101/2025.02.15.638442
H. Wu, "Large language models capsule: a research analysis of in-context learning (icl) and parameter-efficient fine-tuning (peft) methods", Applied and Computational Engineering, vol. 43, no. 1, p. 327-331, 2024. https://doi.org/10.54254/2755-2721/43/20230858
N. Sulaiman and F. Hamzah, "Optimizing llama 7b for medical question answering: a study on fine-tuning strategies and performance on the multimedqa dataset", 2024. https://doi.org/10.31219/osf.io/g5aes
J. Bogaert, E. Jean, C. Bodt, & F. Standaert, "Fine-tuning is not (always) overfitting artifacts", 2023. https://doi.org/10.14428/esann/2023.es2023-152
G. Wiedemann, S. Yimam, & C. Biemann, "Uhh-lt at semeval-2020 task 12: fine-tuning of pre-trained transformer networks for offensive language detection",, p. 1638-1644, 2020. https://doi.org/10.18653/v1/2020.semeval-1.213
A. Aghajanyan, S. Gupta, & L. Zettlemoyer, "Intrinsic dimensionality explains the effectiveness of language model fine-tuning", 2021. https://doi.org/10.18653/v1/2021.acl-long.568
L. Feng, Y. Yang, M. Tan, T. Zeng, Z. Li, H. Tanget al., "Adaptive multi-source domain collaborative fine-tuning for transfer learning",, 2023. https://doi.org/10.20944/preprints202311.0124.v1
F. Ullah, U. Azam, A. Faheem, F. Kamiran, & A. Karim, "Comparing prompt-based and standard fine-tuning for urdu text classification",, p. 6747-6754, 2023. https://doi.org/10.18653/v1/2023.findings-emnlp.449
M. Mosbach, M. Andriushchenko, & D. Klakow, "On the stability of fine-tuning bert: misconceptions, explanations, and strong baselines",, 2020. https://doi.org/10.48550/arxiv.2006.04884
X. Li and P. Liang, "Prefix-tuning: optimizing continuous prompts for generation", 2021. https://doi.org/10.18653/v1/2021.acl-long.353
X. Ma, C. Santos, & A. Arnold, "Contrastive fine-tuning improves robustness for neural rankers", 2021. https://doi.org/10.18653/v1/2021.findings-acl.51
L. Pan, C. Hang, A. Sil, & S. Potdar, "Improved text classification via contrastive adversarial training", 2021. https://doi.org/10.48550/arxiv.2107.10137
Chen M., Tworek J., Jun H., Kaplan J., Yuan Q. and Zarinelli E., “Evaluating Large Language Models Trained on Code”, arXiv preprint arXiv:2107.03374, (2021). https://doi.org/10.48550/arXiv.2107.03374
Xu X., Sharma P., Kinne J. F., O’Neill M., Mazaitis K. and Bhatia S., “A Systematic Evaluation of Large Language Models of Code”, Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 662-678, (2022). https://doi.org/10.48550/arXiv.2202.13169
Wang Z., Cuenca G., Zhou S., Chen T., Lin B. and Matsuo Y., “MCoNaLa: A Benchmark for Code Generation from Multiple Natural Languages”, Findings of the Association for Computational Linguistics: EACL 2023, 265-273, (2023). https://doi.org/10.48550/arXiv.2203.08388
Cassano F., Gouwar J., Nguyen D., Bartolo M., Serrano S. and Sabour A., “MultiPL-E: A Scalable and Extensible Approach to Benchmarking Neural Code Generation”, arXiv preprint arXiv:2208.08227, (2022). https://doi.org/10.48550/arXiv.2208.08227

There are 23 citations in total.

Details

Primary Language	English
Subjects	Computer Software, Software Engineering (Other)
Journal Section	Research Article
Authors	Bugra Uluırmak 0009-0000-3077-673X Rifat Kurban 0000-0002-0277-2210
Early Pub Date	October 13, 2025
Publication Date	November 5, 2025
Submission Date	June 18, 2025
Acceptance Date	July 14, 2025
Published in Issue	Year 2025 Volume: 8 Issue: 4

Cite

APA	Uluırmak, B., & Kurban, R. (2025). Fine-tuning Large Language Models for Turkish Flutter Code Generation. Sakarya University Journal of Computer and Information Sciences, 8(4), 637-650. https://doi.org/10.35377/saucis...1722643
AMA	Uluırmak B, Kurban R. Fine-tuning Large Language Models for Turkish Flutter Code Generation. SAUCIS. October 2025;8(4):637-650. doi:10.35377/saucis.1722643
Chicago	Uluırmak, Bugra, and Rifat Kurban. “Fine-Tuning Large Language Models for Turkish Flutter Code Generation”. Sakarya University Journal of Computer and Information Sciences 8, no. 4 (October 2025): 637-50. https://doi.org/10.35377/saucis. 1722643.
EndNote	Uluırmak B, Kurban R (October 1, 2025) Fine-tuning Large Language Models for Turkish Flutter Code Generation. Sakarya University Journal of Computer and Information Sciences 8 4 637–650.
IEEE	B. Uluırmak and R. Kurban, “Fine-tuning Large Language Models for Turkish Flutter Code Generation”, SAUCIS, vol. 8, no. 4, pp. 637–650, 2025, doi: 10.35377/saucis...1722643.
ISNAD	Uluırmak, Bugra - Kurban, Rifat. “Fine-Tuning Large Language Models for Turkish Flutter Code Generation”. Sakarya University Journal of Computer and Information Sciences 8/4 (October2025), 637-650. https://doi.org/10.35377/saucis. 1722643.
JAMA	Uluırmak B, Kurban R. Fine-tuning Large Language Models for Turkish Flutter Code Generation. SAUCIS. 2025;8:637–650.
MLA	Uluırmak, Bugra and Rifat Kurban. “Fine-Tuning Large Language Models for Turkish Flutter Code Generation”. Sakarya University Journal of Computer and Information Sciences, vol. 8, no. 4, 2025, pp. 637-50, doi:10.35377/saucis. 1722643.
Vancouver	Uluırmak B, Kurban R. Fine-tuning Large Language Models for Turkish Flutter Code Generation. SAUCIS. 2025;8(4):637-50.

Download Cover Image

Article Files

Full Text

INDEXING & ABSTRACTING & ARCHIVING

29070 The papers in this journal are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License