Saudi-Dialect-ALLaM: LoRA Fine-Tuning for Dialectal Arabic Generation

arXiv · August 19, 2025 · Significant research

NLP LLM Research Arabic AI Dialectal Arabic

Summary

This paper introduces Saudi-Dialect-ALLaM, a LoRA fine-tuned version of the Saudi Arabian foundation model ALLaM-7B-Instruct-preview, designed to improve the generation of Saudi dialects (Najdi and Hijazi). The model is trained on a private dataset of 5,466 synthetic instruction-response pairs, with two variants explored: Dialect-Token and No-Token training. Results indicate that the Dialect-Token model achieves superior dialect control and fidelity compared to generic instruction models, although the dataset and model weights are not released.

Keywords

LLM · Arabic · Saudi Dialect · LoRA · Fine-tuning

Read original article →

Get the weekly digest

Top AI stories from the GCC region, every week.