KAUST researchers have developed an energy-efficient wastewater treatment process that generates high-quality effluent suitable for reuse. A pilot plant in Jeddah, operating since July 2022 in collaboration with MODON, treats 50,000 liters of wastewater daily off-grid, generating 1.5 kWh of electrical energy per 1,000 liters treated. The plant utilizes an anaerobic membrane bioreactor (AnMBR) coupled with UV disinfection, removing up to 99.9999% of microorganisms and producing less solid waste. Why it matters: This decentralized, energy-independent system offers a sustainable solution for water treatment in resource-scarce regions of the Middle East, aligning with Saudi Arabia's sustainability goals.
Researchers introduce ArabicaQA, a large-scale dataset for Arabic question answering, comprising 89,095 answerable and 3,701 unanswerable questions. They also present AraDPR, a dense passage retrieval model trained on the Arabic Wikipedia. The paper includes benchmarking of large language models (LLMs) for Arabic question answering. Why it matters: This work addresses a significant gap in Arabic NLP resources and provides valuable tools and benchmarks for advancing research in the field.
The paper introduces MedPromptX, a clinical decision support system using multimodal large language models (MLLMs), few-shot prompting (FP), and visual grounding (VG) for chest X-ray diagnosis, integrating imagery with EHR data. MedPromptX refines few-shot data dynamically for real-time adjustment to new patient scenarios and narrows the search area in X-ray images. The study introduces MedPromptX-VQA, a new visual question answering dataset, and demonstrates state-of-the-art performance with an 11% improvement in F1-score compared to baselines.
Researchers at MBZUAI introduce FissionFusion, a hierarchical model merging approach to improve medical image analysis performance. The method uses local and global aggregation of models based on hyperparameter configurations, along with a cyclical learning rate scheduler for efficient model generation. Experiments show FissionFusion outperforms standard model souping by approximately 6% on HAM10000 and CheXpert datasets and improves OOD performance.
Researchers at MBZUAI have introduced TiBiX, a novel approach leveraging temporal information from previous chest X-rays (CXRs) and reports for bidirectional generation of current CXRs and reports. TiBiX addresses two key challenges: generating current images from previous images and reports, and generating current reports from both previous and current images. The study also introduces a curated temporal benchmark dataset derived from the MIMIC-CXR dataset and achieves state-of-the-art results in report generation.
The paper introduces AraPoemBERT, an Arabic language model pretrained exclusively on 2.09 million verses of Arabic poetry. AraPoemBERT was evaluated against five other Arabic language models on tasks including poet's gender classification (99.34% accuracy) and poetry sub-meter classification (97.79% accuracy). The model achieved state-of-the-art results in these and other downstream tasks, and is publicly available on Hugging Face. Why it matters: This specialized model advances Arabic NLP by providing a new state-of-the-art tool tailored for the nuances of classical Arabic poetry.
KAUST researchers have developed a green synthetic biology approach using engineered algae to replicate the complex fragrances of agarwood, also known as oudh. They catalogued the chemical diversity of sesquiterpenes (STPs) in 58 agarwood samples and reproduced some of the chemical complexity of agarwood STPs in algae using synthetic biology. The team used the green alga Chlamydomonas reinhardtii to produce nine distinct STP chemical products widely found in agarwood, offering a sustainable alternative to harvesting endangered trees. Why it matters: This research provides a sustainable route for producing sought-after fragrances, reducing pressure on endangered agarwood tree populations and promoting green chemistry in the region.
Researchers at MBZUAI have introduced MedMerge, a transfer learning technique that merges weights from independently initialized models to improve performance on medical imaging tasks. MedMerge learns kernel-level weights to combine features from different models into a single model. Experiments across various medical imaging tasks demonstrated performance gains of up to 7% in F1 score.
This paper explores the impact of tokenization strategies and vocabulary sizes on Arabic language model performance across NLP tasks like news classification and sentiment analysis. It compares four tokenizers, finding that Byte Pair Encoding (BPE) with Farasa performs best overall due to its morphological analysis capabilities. The study surprisingly found limited impact of vocabulary size on performance with fixed model sizes, challenging assumptions about vocabulary size and model performance. Why it matters: The findings provide insights for developing more effective and nuanced Arabic language models, particularly for handling dialectal variations and promoting responsible AI development in the region.
The paper introduces AraTrust, a new benchmark for evaluating the trustworthiness of LLMs when prompted in Arabic. The benchmark contains 522 multiple-choice questions covering dimensions like truthfulness, ethics, safety, and fairness. Experiments using AraTrust showed that GPT-4 performed the best, while open-source models like AceGPT 7B and Jais 13B had lower scores. Why it matters: This benchmark addresses a critical gap in evaluating LLMs for Arabic, which is essential for ensuring the safe and ethical deployment of AI in the Arab world.
This paper introduces a new Single Domain Generalization (SDG) method called ConDiSR for medical image classification, using channel-wise contrastive disentanglement and reconstruction-based style regularization. The method is evaluated on multicenter histopathology image classification, achieving a 1% improvement in average accuracy compared to state-of-the-art SDG baselines. Code is available at https://github.com/BioMedIA-MBZUAI/ConDiSR.
Researchers from MBZUAI have developed XReal, a diffusion model for generating realistic chest X-ray images with precise control over anatomy and pathology location. The model utilizes an Anatomy Controller and a Pathology Controller to introduce spatial control in a pre-trained Text-to-Image Diffusion Model without fine-tuning. XReal outperforms existing X-ray diffusion models in realism, as evaluated by quantitative metrics and radiologists' ratings, and the code/weights are available.
KAUST researchers have developed a new synthetic biology process using metabolically engineered algae to produce fragrant sesquiterpenoids, the core compounds in agarwood and other perfumes. The process, developed by the Lauersen and Szekely groups, achieved yields 25 times higher than previous methods and allows for the synthesis of 103 types of fragrant sesquiterpenoids. It also incorporates an energy-efficient nanofiltration step and operates at room temperature with minimal waste. Why it matters: This sustainable bioprocess offers a green alternative to environmentally damaging harvesting of natural resources for the $44 billion fragrance industry, with potential applications in drug development.
KAUST scientists developed a new perovskite solar cell design using thin perovskite layers at the top and bottom of the interface. The new design achieves a power conversion efficiency of 25.6%, comparable to silicon solar cells, with only a 5% efficiency loss after 1000 hours of high heat exposure. The key innovation is the use of a specific ligand that interacts effectively with the 3D perovskites for passivation, maintaining purity in the thin layers. Why it matters: This advancement enhances the stability and efficiency of perovskite solar cells, making them a more viable and cost-effective alternative to silicon, especially for countries like Saudi Arabia aiming to increase renewable energy reliance.