Middle East AI

This Week arXiv

Datacenters in the Desert: Feasibility and Sustainability of LLM Inference in the Middle East

arXiv · · Significant research

Summary

This paper analyzes the energy consumption and carbon footprint of LLM inference in the UAE compared to Iceland, Germany, and the USA. The study uses DeepSeek Coder 1.3B and the HumanEval dataset to evaluate code generation. It provides a comparative analysis of geographical trade-offs for climate-aware AI deployment, specifically addressing the challenges and potential of datacenters in desert regions.

Keywords

LLM · Datacenter · Carbon Footprint · UAE · Sustainability

Get the weekly digest

Top AI stories from the GCC region, every week.

Related

Arabic Mini-ClimateGPT : A Climate Change and Sustainability Tailored Arabic LLM

arXiv ·

Researchers introduce Arabic Mini-ClimateGPT, a tailored Arabic LLM for climate change and sustainability. The model is fine-tuned on the Clima500-Instruct dataset and uses vector embedding retrieval during inference. Evaluations show the model outperforms baseline LLMs and is preferred by experts in 81.6% of cases.

From Words to Proverbs: Evaluating LLMs Linguistic and Cultural Competence in Saudi Dialects with Absher

arXiv ·

This paper introduces Absher, a new benchmark for evaluating LLMs' linguistic and cultural competence in Saudi dialects. The benchmark comprises over 18,000 multiple-choice questions spanning six categories, using dialectal words, phrases, and proverbs from various regions of Saudi Arabia. Evaluation of state-of-the-art LLMs reveals performance gaps, especially in cultural inference and contextual understanding, highlighting the need for dialect-aware training.

SaudiCulture: A Benchmark for Evaluating Large Language Models Cultural Competence within Saudi Arabia

arXiv ·

The paper introduces SaudiCulture, a new benchmark for evaluating the cultural competence of LLMs within Saudi Arabia, covering five major geographical regions and diverse cultural domains. The benchmark includes questions of varying complexity and distinguishes between common and specialized regional knowledge. Evaluations of five LLMs (GPT-4, Llama 3.3, FANAR, Jais, and AceGPT) revealed performance declines on region-specific questions, highlighting the need for region-specific knowledge in LLM training.

N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition

arXiv ·

This paper benchmarks the performance of OpenAI's Whisper model on diverse Arabic speech recognition tasks, using publicly available data and novel dialect evaluation sets. The study explores zero-shot, few-shot, and full finetuning scenarios. Results indicate that while Whisper outperforms XLS-R models in zero-shot settings on standard datasets, its performance drops significantly when applied to unseen Arabic dialects.