The Russian Immune Diversity Atlas project aims to profile immune cells from people of different ancestries at a multiomics level. The goal is to reconstruct a reference atlas of the healthy immune system and investigate its perturbations in Type II Diabetes (T2D). The project seeks to identify novel mechanisms and genetic/epigenetic markers for early T2D diagnostics, prognosis, and therapy as part of the international Human Cell Atlas. Why it matters: Addressing genetic diversity in biomedical research, particularly in the context of the Human Cell Atlas, is crucial for personalized medicine and ensuring that treatments are effective across diverse populations in the Middle East and globally.
Researchers developed Atlas-Chat, a collection of LLMs for dialectal Arabic, focusing on Moroccan Arabic (Darija). They constructed an instruction dataset by consolidating existing Darija language resources and translating English instructions. Atlas-Chat models (2B, 9B, 27B) outperform state-of-the-art and Arabic-specialized LLMs like LLaMa, Jais, and AceGPT on Darija NLP tasks. Why it matters: This work addresses the gap in LLM support for low-resource Arabic dialects, providing a methodology for instruction-tuning and benchmarks for future research.
MBZUAI is conducting research to improve cross-cultural understanding using AI, including studying LLM limitations in recognizing cultural references. They developed "Culturally Yours," a tool that helps users comprehend cultural references in text, and the "All Languages Matter Benchmark" (ALM Bench) to evaluate multimodal LLMs across 100 languages. MBZUAI has also developed LLMs tailored to low-resource languages like Jais (Arabic), Nanda (Hindi), and Sherkala (Kazakh). Why it matters: These initiatives promote inclusivity and ensure AI systems are culturally aware and can serve diverse populations effectively, particularly in the Middle East's multicultural context.
The ArabJobs dataset is a new corpus of over 8,500 Arabic job advertisements collected from Egypt, Jordan, Saudi Arabia, and the UAE. The dataset contains over 550,000 words and captures linguistic, regional, and socio-economic variation in the Arab labor market. It is available on GitHub and can be used for fairness-aware Arabic NLP and labor market research.
KAUST researchers undertook a week-long expedition in May 2017 from Al-Lith, Saudi Arabia to explore the biodiversity of the Red Sea. The expedition involved 35 participants, including KAUST faculty and 10 international marine scientists, and collected over 3,000 specimens. Over 50 species not previously recorded were found during the expedition. Why it matters: Cataloging the Red Sea's biodiversity is crucial given increasing development and provides insights into how marine organisms adapt to extreme conditions, which can inform climate change predictions.