The paper introduces ORCA, a new public benchmark for evaluating Arabic language understanding. ORCA covers diverse Arabic varieties and includes 60 datasets across seven NLU task clusters. The benchmark was used to compare 18 multilingual and Arabic language models and includes a public leaderboard with a unified evaluation metric. Why it matters: ORCA addresses the lack of a comprehensive Arabic benchmark, enabling better progress measurement for Arabic and multilingual language models.
KAUST researchers, in collaboration with Spanish scientists, have released the Global Ocean Gene Catalog 1.0, the world's largest open-source catalog of marine microbes. The catalog, created using the KAUST Metagenomic Analysis Platform (KMAP), matches microbial class with gene function, geographic location, and habitat type, including 317 million unique gene clusters. The catalog analyzes 2102 ocean samples taken from different depths and locations around the world. Why it matters: This resource will enable researchers to investigate ocean ecosystems, track pollution impact, and explore biotechnology applications, potentially driving significant advances in fields like antibiotic discovery and plastic degradation.
A new neural network architecture called Orchid was introduced that uses adaptive convolutions to achieve quasilinear computational complexity O(N logN) for sequence modeling. Orchid adapts its convolution kernel dynamically based on the input sequence. Evaluations across language modeling and image classification show that Orchid outperforms attention-based architectures like BERT and Vision Transformers, often with smaller model sizes. Why it matters: Orchid extends the feasible sequence length beyond the practical limits of dense attention layers, representing progress toward more efficient and scalable deep learning models.
KAUST researchers from the Red Sea Research Center (RSRC) and Computational Bioscience Research Center (CBRC) found macroalgae DNA prevalent in the open ocean, up to 5,000 km from coastal areas. 69% of drifting macroalgae sinks below 1,000 m depth, sequestering carbon in deep ocean waters. The study used metagenomes generated by global ocean expeditions Tara Oceans and Malaspina, analyzed via KAUST's DMAP platform and Shaheen supercomputer. Why it matters: The findings confirm the role of macroalgae in carbon sequestration, highlighting their importance in blue carbon assessments for climate change mitigation and underscoring KAUST's contribution to environmental sustainability research.