Skip to content
GCC AI Research

Archive Monthly

July 2024

12 articles

Top Stories

KAUST gene sequencing technology gives new hope to patients

KAUST · · Healthcare Research

KAUST and KFSHRC have developed NanoRanger, a new gene sequencing system for identifying mutations causing genetic diseases. NanoRanger offers a faster and simpler process to detect DNA abnormalities at base resolution, building on existing long-read sequencing technologies. The system is designed to be cheaper and faster, targeting diseases prevalent in Saudi Arabia due to consanguinity. Why it matters: The technology has the potential to improve diagnosis and treatment of Mendelian diseases, which are especially prevalent in the Arab world.

Enhancing Semantic Similarity Understanding in Arabic NLP with Nested Embedding Learning

arXiv · · NLP Arabic AI

This paper introduces a nested embedding learning framework for Arabic NLP, utilizing Matryoshka Embedding Learning and multilingual models. The authors translated sentence similarity datasets into Arabic to enable comprehensive evaluation. Experiments on the Arabic Natural Language Inference dataset show Matryoshka embedding models outperform traditional models by 20-25% in capturing Arabic semantic nuances. Why it matters: This work advances Arabic NLP by providing a new method and evaluation benchmark for semantic similarity, which is crucial for tasks like information retrieval and text understanding.

ALLaM: Large Language Models for Arabic and English

arXiv · · NLP LLM

The paper introduces ALLaM, a series of large language models for Arabic and English, designed to support Arabic Language Technologies. The models are trained with language alignment and knowledge transfer in mind, using a decoder-only architecture. ALLaM achieves state-of-the-art results on Arabic benchmarks like MMLU Arabic and Arabic Exams. Why it matters: This work advances Arabic NLP by providing high-performing LLMs and demonstrating effective techniques for cross-lingual transfer learning and alignment with human preferences.

AlcLaM: Arabic Dialectal Language Model

arXiv · · NLP LLM

The paper introduces AlcLaM, an Arabic dialectal language model trained on 3.4M sentences from social media. AlcLaM expands the vocabulary and retrains a BERT-based model, using only 13GB of dialectal text. Despite the smaller training data, AlcLaM outperforms models like CAMeL, MARBERT, and ArBERT on various Arabic NLP tasks. Why it matters: AlcLaM offers a more efficient and accurate approach to Arabic NLP by focusing on dialectal Arabic, which is often underrepresented in existing models.

KAUST pushes Saudi to forefront of 6G technologies

KAUST · · Infrastructure Partnership

Ericsson is continuing its funding for two telecommunications programs at KAUST, managed by Professors Mohamed-Slim Alouini and Atif Shamim, focusing on free-space optics (FSO) and reconfigurable intelligent surfaces (RIS). These technologies are considered critical for achieving 5G and 6G capabilities. FSO uses lasers to transmit signals through free space, while RIS develops intelligent surfaces to manage wireless signals. Why it matters: This partnership positions Saudi Arabia at the forefront of developing next-generation telecommunications infrastructure and capabilities, addressing key challenges in 5G and 6G deployment.

NativQA: Multilingual Culturally-Aligned Natural Query for LLMs

arXiv · · NLP LLM

The paper introduces NativQA, a language-independent framework for constructing culturally and regionally aligned QA datasets in native languages. Using the framework, the authors created MultiNativQA, a multilingual natural QA dataset consisting of ~64k manually annotated QA pairs in seven languages. The dataset covers queries from native speakers from 9 regions covering 18 topics, and is designed for evaluating and tuning LLMs. Why it matters: The framework and dataset enable the creation of more culturally relevant and effective LLMs for diverse linguistic communities, including those in the Middle East.

KAUST partners with Partanna to develop carbon-neutral concrete

KAUST · · Partnership Research

KAUST and Partanna have launched a 12-month R&D partnership to enhance CO2 removal in concrete manufacturing. The collaboration will integrate Partanna’s formula with KAUST’s Direct Air Capture (DAC) technology. Partanna's patented concrete avoids Portland Cement and uses a binder made from natural and recycled materials, enabling CO2 absorption. Why it matters: This partnership highlights Saudi Arabia's commitment to sustainable construction and carbon-negative technologies, potentially revolutionizing building practices in the region and beyond.

GemmAr: Enhancing LLMs Through Arabic Instruction-Tuning

arXiv · · NLP LLM

The paper introduces InstAr-500k, a new Arabic instruction dataset of 500,000 examples designed to improve LLM performance in Arabic. Researchers fine-tuned the open-source Gemma-7B model using InstAr-500k and evaluated it on downstream tasks, achieving strong results on Arabic NLP benchmarks. They then released GemmAr-7B-V1, a model specifically tuned for Arabic NLP tasks. Why it matters: This work addresses the lack of high-quality Arabic instruction data, potentially boosting the capabilities of Arabic language models.