Middle East AI

This Week arXiv

Language Shift or Maintenance? An Intergenerational Study of the Tibetan Community in Saudi Arabia

arXiv · · Notable

Summary

A study investigated language shift from Tibetan to Arabic among Tibetan families who migrated to Saudi Arabia 70 years ago. Data from 96 participants across three age groups revealed significant intergenerational differences in language use. Younger members rarely used Tibetan, while older members used it slightly more, with a p-value of .001 indicating statistical significance.

Keywords

language shift · Tibetan · Arabic · intergenerational · Saudi Arabia

Get the weekly digest

Top AI stories from the GCC region, every week.

Related

From Words to Proverbs: Evaluating LLMs Linguistic and Cultural Competence in Saudi Dialects with Absher

arXiv ·

This paper introduces Absher, a new benchmark for evaluating LLMs' linguistic and cultural competence in Saudi dialects. The benchmark comprises over 18,000 multiple-choice questions spanning six categories, using dialectal words, phrases, and proverbs from various regions of Saudi Arabia. Evaluation of state-of-the-art LLMs reveals performance gaps, especially in cultural inference and contextual understanding, highlighting the need for dialect-aware training.

SaudiCulture: A Benchmark for Evaluating Large Language Models Cultural Competence within Saudi Arabia

arXiv ·

The paper introduces SaudiCulture, a new benchmark for evaluating the cultural competence of LLMs within Saudi Arabia, covering five major geographical regions and diverse cultural domains. The benchmark includes questions of varying complexity and distinguishes between common and specialized regional knowledge. Evaluations of five LLMs (GPT-4, Llama 3.3, FANAR, Jais, and AceGPT) revealed performance declines on region-specific questions, highlighting the need for region-specific knowledge in LLM training.

Saudi-Dialect-ALLaM: LoRA Fine-Tuning for Dialectal Arabic Generation

arXiv ·

This paper introduces Saudi-Dialect-ALLaM, a LoRA fine-tuned version of the Saudi Arabian foundation model ALLaM-7B-Instruct-preview, designed to improve the generation of Saudi dialects (Najdi and Hijazi). The model is trained on a private dataset of 5,466 synthetic instruction-response pairs, with two variants explored: Dialect-Token and No-Token training. Results indicate that the Dialect-Token model achieves superior dialect control and fidelity compared to generic instruction models, although the dataset and model weights are not released.

Understanding & Predicting User Lifetime with Machine Learning in an Anonymous Location-Based Social Network

arXiv ·

Researchers studied user lifetime prediction in the location-based social network Jodel within Saudi Arabia, leveraging its disjoint communities. Machine learning models, particularly Random Forest, were trained to predict user lifetime as a regression and classification problem. A single countrywide model generalizes well and performs similarly to community-specific models.