Search

Results for "mixture of experts"

Understanding the mixture of the expert layer in Deep Learning

MBZUAI · Invalid Date

A Mixture of Experts (MoE) layer is a sparsely activated deep learning layer. It uses a router network to direct each token to one of the experts. Yuanzhi Li, an assistant professor at CMU and affiliated faculty at MBZUAI, researches deep learning theory and NLP. Why it matters: This highlights MBZUAI's engagement with cutting-edge deep learning research, specifically in efficient model design.

Towards Robust Multimodal Open-set Test-time Adaptation via Adaptive Entropy-aware Optimization

arXiv · Jan 23

This paper introduces Adaptive Entropy-aware Optimization (AEO), a new framework to tackle Multimodal Open-set Test-time Adaptation (MM-OSTTA). AEO uses Unknown-aware Adaptive Entropy Optimization (UAE) and Adaptive Modality Prediction Discrepancy Optimization (AMP) to distinguish unknown class samples during online adaptation by amplifying the entropy difference between known and unknown samples. The study establishes a new benchmark derived from existing datasets with five modalities and evaluates AEO's performance across various domain shift scenarios, demonstrating its effectiveness in long-term and continual MM-OSTTA settings.

BiMediX: Bilingual Medical Mixture of Experts LLM

arXiv · Feb 20

MBZUAI researchers introduce BiMediX, a bilingual (English and Arabic) mixture of experts LLM for medical applications. The model is trained on BiMed1.3M, a new 1.3 million bilingual instruction dataset and outperforms existing models like Med42 and Jais-30B on medical benchmarks. Code and models are available on Github.