The paper introduces ORCA, a new public benchmark for evaluating Arabic language understanding. ORCA covers diverse Arabic varieties and includes 60 datasets across seven NLU task clusters. The benchmark was used to compare 18 multilingual and Arabic language models and includes a public leaderboard with a unified evaluation metric. Why it matters: ORCA addresses the lack of a comprehensive Arabic benchmark, enabling better progress measurement for Arabic and multilingual language models.
KAUST's Visualization Core Lab (KVL) has released inshimtu, a pseudo in situ visualization system for scientists working with large datasets and supercomputer simulations. Inshimtu simplifies the implementation of in situ visualization by using existing simulation output files without requiring changes to the simulation code. It helps scientists determine if implementing a full in situ visualization into their code is worthwhile. Why it matters: This open-source tool can improve the efficiency of supercomputing research in the region by allowing researchers to assess the value of in situ visualization before fully committing to it.
KAUST is hosting the Marine Megafauna Movement Workshop (October 19-20) featuring international speakers showcasing research on marine animal behavior using sensors and analytics. Enrichment in the Fall 2015 (October 16-24) at KAUST will focus on marine animal movement with lectures, trips, movies, and music. KAUST aims to merge research on marine animal movement with the study of human mobility to gain new insights. Why it matters: This interdisciplinary approach could advance understanding of both marine ecosystems and human behavior, while promoting marine conservation efforts in the Red Sea.
KAUST researchers from the Red Sea Research Center (RSRC) and Computational Bioscience Research Center (CBRC) found macroalgae DNA prevalent in the open ocean, up to 5,000 km from coastal areas. 69% of drifting macroalgae sinks below 1,000 m depth, sequestering carbon in deep ocean waters. The study used metagenomes generated by global ocean expeditions Tara Oceans and Malaspina, analyzed via KAUST's DMAP platform and Shaheen supercomputer. Why it matters: The findings confirm the role of macroalgae in carbon sequestration, highlighting their importance in blue carbon assessments for climate change mitigation and underscoring KAUST's contribution to environmental sustainability research.
Holger Pirk from Imperial College London is developing a novel approach to data management system composition called BOSS. The system uses a homoiconic representation of data and code and partial evaluation of queries by components, drawing inspiration from compiler-construction research. BOSS achieves a fully composable design that effectively combines different data models, hardware platforms, and processing engines, enabling features like GPU acceleration and generative data cleaning with minimal overhead. Why it matters: This research on composable database systems can broaden the applicability of data management techniques in the GCC region, enabling more flexible and efficient data processing for various applications.
Oscar Becerril Lio, a KAUST alumnus who graduated in 2011 with a master's degree in applied mathematics specializing in operations research, is now an operations manager in Mexico. He leverages his KAUST experience in industrial engineering, construction, operations research, optimization, and logistics. Lio advises current KAUST students to learn from the diverse community and take advantage of travel opportunities. Why it matters: This alumni profile showcases KAUST's role in developing professionals who contribute to diverse industries and geographies, highlighting the university's global impact.
A new neural network architecture called Orchid was introduced that uses adaptive convolutions to achieve quasilinear computational complexity O(N logN) for sequence modeling. Orchid adapts its convolution kernel dynamically based on the input sequence. Evaluations across language modeling and image classification show that Orchid outperforms attention-based architectures like BERT and Vision Transformers, often with smaller model sizes. Why it matters: Orchid extends the feasible sequence length beyond the practical limits of dense attention layers, representing progress toward more efficient and scalable deep learning models.
KAUST researchers, in collaboration with Spanish scientists, have released the Global Ocean Gene Catalog 1.0, the world's largest open-source catalog of marine microbes. The catalog, created using the KAUST Metagenomic Analysis Platform (KMAP), matches microbial class with gene function, geographic location, and habitat type, including 317 million unique gene clusters. The catalog analyzes 2102 ocean samples taken from different depths and locations around the world. Why it matters: This resource will enable researchers to investigate ocean ecosystems, track pollution impact, and explore biotechnology applications, potentially driving significant advances in fields like antibiotic discovery and plastic degradation.