A study investigated language shift from Tibetan to Arabic among Tibetan families who migrated to Saudi Arabia 70 years ago. Data from 96 participants across three age groups revealed significant intergenerational differences in language use. Younger members rarely used Tibetan, while older members used it slightly more, with a p-value of .001 indicating statistical significance.
Keywords
language shift · Tibetan · Arabic · intergenerational · Saudi Arabia
This paper introduces Absher, a new benchmark for evaluating LLMs' linguistic and cultural competence in Saudi dialects. The benchmark comprises over 18,000 multiple-choice questions spanning six categories, using dialectal words, phrases, and proverbs from various regions of Saudi Arabia. Evaluation of state-of-the-art LLMs reveals performance gaps, especially in cultural inference and contextual understanding, highlighting the need for dialect-aware training.
The paper introduces SaudiCulture, a new benchmark for evaluating the cultural competence of LLMs within Saudi Arabia, covering five major geographical regions and diverse cultural domains. The benchmark includes questions of varying complexity and distinguishes between common and specialized regional knowledge. Evaluations of five LLMs (GPT-4, Llama 3.3, FANAR, Jais, and AceGPT) revealed performance declines on region-specific questions, highlighting the need for region-specific knowledge in LLM training.
This paper introduces Saudi-Dialect-ALLaM, a LoRA fine-tuned version of the Saudi Arabian foundation model ALLaM-7B-Instruct-preview, designed to improve the generation of Saudi dialects (Najdi and Hijazi). The model is trained on a private dataset of 5,466 synthetic instruction-response pairs, with two variants explored: Dialect-Token and No-Token training. Results indicate that the Dialect-Token model achieves superior dialect control and fidelity compared to generic instruction models, although the dataset and model weights are not released.
Researchers studied user lifetime prediction in the location-based social network Jodel within Saudi Arabia, leveraging its disjoint communities. Machine learning models, particularly Random Forest, were trained to predict user lifetime as a regression and classification problem. A single countrywide model generalizes well and performs similarly to community-specific models.