Researchers at MBZUAI have demonstrated a method called "Data Laundering" to artificially boost language model benchmark scores using knowledge distillation. The technique covertly transfers benchmark-specific knowledge, leading to inflated accuracy without genuine improvements in reasoning. The study highlights a vulnerability in current AI evaluation practices and calls for more robust benchmarks.
MBZUAI researchers introduce UniMed-CLIP, a unified Vision-Language Model (VLM) for diverse medical imaging modalities, trained on the new large-scale, open-source UniMed dataset. UniMed comprises over 5.3 million image-text pairs across six modalities: X-ray, CT, MRI, Ultrasound, Pathology, and Fundus, created using LLMs to transform classification datasets into image-text formats. UniMed-CLIP significantly outperforms existing generalist VLMs and matches modality-specific medical VLMs in zero-shot evaluations, improving over BiomedCLIP by +12.61 on average across 21 datasets while using 3x less training data.
MBZUAI releases BiMediX2, a bilingual (Arabic-English) Bio-Medical Large Multimodal Model, along with the BiMed-V dataset (1.6M samples) and BiMed-MBench evaluation benchmark. BiMediX2 supports multi-turn conversation in Arabic and English and handles diverse medical imaging modalities. The model achieves state-of-the-art results on medical LLM and LMM benchmarks, outperforming existing methods and GPT-4 in specific evaluations.
Researchers propose a spatio-temporal model for high-resolution wind forecasting in Saudi Arabia using Echo State Networks and stochastic partial differential equations. The model reduces spatial information via energy distance, captures dynamics with a sparse recurrent neural network, and reconstructs data using a non-stationary stochastic partial differential equation approach. The model achieves more accurate forecasts of wind speed and energy, potentially saving up to one million dollars annually compared to existing models.