Researchers at MBZUAI have demonstrated a method called "Data Laundering" to artificially boost language model benchmark scores using knowledge distillation. The technique covertly transfers benchmark-specific knowledge, leading to inflated accuracy without genuine improvements in reasoning. The study highlights a vulnerability in current AI evaluation practices and calls for more robust benchmarks.
MBZUAI researchers introduce UniMed-CLIP, a unified Vision-Language Model (VLM) for diverse medical imaging modalities, trained on the new large-scale, open-source UniMed dataset. UniMed comprises over 5.3 million image-text pairs across six modalities: X-ray, CT, MRI, Ultrasound, Pathology, and Fundus, created using LLMs to transform classification datasets into image-text formats. UniMed-CLIP significantly outperforms existing generalist VLMs and matches modality-specific medical VLMs in zero-shot evaluations, improving over BiomedCLIP by +12.61 on average across 21 datasets while using 3x less training data.
Researchers propose a spatio-temporal model for high-resolution wind forecasting in Saudi Arabia using Echo State Networks and stochastic partial differential equations. The model reduces spatial information via energy distance, captures dynamics with a sparse recurrent neural network, and reconstructs data using a non-stationary stochastic partial differential equation approach. The model achieves more accurate forecasts of wind speed and energy, potentially saving up to one million dollars annually compared to existing models.
MBZUAI releases BiMediX2, a bilingual (Arabic-English) Bio-Medical Large Multimodal Model, along with the BiMed-V dataset (1.6M samples) and BiMed-MBench evaluation benchmark. BiMediX2 supports multi-turn conversation in Arabic and English and handles diverse medical imaging modalities. The model achieves state-of-the-art results on medical LLM and LMM benchmarks, outperforming existing methods and GPT-4 in specific evaluations.