Michael Kampffmeyer from UiT The Arctic University of Norway presented a talk at MBZUAI on representation learning for deep clustering and few-shot learning. The talk covered deep clustering in multi-view settings and the influence of geometrical representation properties on few-shot classification performance. He specifically discussed embedding representations on the hypersphere and its connection to the hubness phenomenon. Why it matters: This highlights MBZUAI's role in hosting discussions on advanced machine learning topics like few-shot learning, which are crucial for addressing data scarcity challenges in the region and beyond.
This article discusses domain shift in machine learning, where testing data differs from training data, and methods to mitigate it via domain adaptation and generalization. Domain adaptation uses labeled source data and unlabeled target data. Domain generalization uses labeled data from single or multiple source domains to generalize to unseen target domains. Why it matters: Research in mitigating domain shift enhances the robustness and applicability of AI models in diverse real-world scenarios.
This paper introduces Adaptive Entropy-aware Optimization (AEO), a new framework to tackle Multimodal Open-set Test-time Adaptation (MM-OSTTA). AEO uses Unknown-aware Adaptive Entropy Optimization (UAE) and Adaptive Modality Prediction Discrepancy Optimization (AMP) to distinguish unknown class samples during online adaptation by amplifying the entropy difference between known and unknown samples. The study establishes a new benchmark derived from existing datasets with five modalities and evaluates AEO's performance across various domain shift scenarios, demonstrating its effectiveness in long-term and continual MM-OSTTA settings.
Researchers at MBZUAI have developed DynaMMo, a dynamic model merging method for efficient class incremental learning using medical images. DynaMMo merges multiple networks at different training stages using lightweight learnable modules, reducing computational overhead. Evaluated on three datasets, DynaMMo achieved a 10-fold reduction in GFLOPS compared to existing dynamic methods with a 2.76 average accuracy drop.
This survey paper reviews recent literature on continual learning in medical imaging, addressing challenges like catastrophic forgetting and distribution shifts. It covers classification, segmentation, detection, and other tasks, while providing a taxonomy of studies and identifying challenges. The authors also maintain a GitHub repository to keep the survey up-to-date with the latest research.
MBZUAI Professor Salman Khan is researching continuous, lifelong learning systems for computer vision, aiming to mimic human learning processes like curiosity and discovery. His work focuses on learning from limited data and adversarial robustness of deep neural networks. Khan, along with MBZUAI professors Fahad Khan and Rao Anwer, and partners from other universities, presented research at CVPR 2022. Why it matters: This research has the potential to significantly improve the ability of AI systems to understand and adapt to the real world, enabling more intelligent autonomous systems.
A recent talk at MBZUAI discussed "Green Learning" and Operational Neural Networks (ONNs) as efficient alternatives to CNNs. ONNs use "nodal" and "pool" operators and "generative neurons" to expand neuron learning capacity. Moncef Gabbouj from Tampere University presented Self-Organized ONNs (Self-ONNs) and their signal processing applications. Why it matters: Exploring more efficient AI models is crucial for sustainable development of AI in the region, as it addresses computational resource constraints and promotes broader accessibility.
This article discusses retrieval augmentation in text generation, where information retrieved from an external source is used to condition predictions. It references recent work on retrieval-augmented image captioning, showing that model size can be greatly reduced when training data is available through retrieval. The author intends to continue this work focusing on the intersection of retrieval augmentation and in-context learning, and controllable image captioning for language learning materials. Why it matters: This research direction has the potential to improve transfer learning in vision-language models, which could be especially relevant for downstream applications in Arabic NLP and multimodal tasks.