Saudi-Dialect-ALLaM: LoRA Fine-Tuning for Dialectal Arabic Generation
arXiv · · Significant research
Summary
This paper introduces Saudi-Dialect-ALLaM, a LoRA fine-tuned version of the Saudi Arabian foundation model ALLaM-7B-Instruct-preview, designed to improve the generation of Saudi dialects (Najdi and Hijazi). The model is trained on a private dataset of 5,466 synthetic instruction-response pairs, with two variants explored: Dialect-Token and No-Token training. Results indicate that the Dialect-Token model achieves superior dialect control and fidelity compared to generic instruction models, although the dataset and model weights are not released.
Keywords
LLM · Arabic · Saudi Dialect · LoRA · Fine-tuning
Get the weekly digest
Top AI stories from the GCC region, every week.