This seminar explores vision systems through self-supervised representation learning, addressing challenges and solutions in mainstream vision self-supervised learning methods. It discusses developing versatile representations across modalities, tasks, and architectures to propel the evolution of the vision foundation model. Tong Zhang from EPFL, with a background from Beihang University, New York University, and Australian National University, will lead the talk. Why it matters: Advancing vision foundation models is crucial for expanding AI applications, especially in the Middle East where computer vision can address challenges in areas like urban planning, agriculture, and environmental monitoring.
Michael Kampffmeyer from UiT The Arctic University of Norway presented a talk at MBZUAI on representation learning for deep clustering and few-shot learning. The talk covered deep clustering in multi-view settings and the influence of geometrical representation properties on few-shot classification performance. He specifically discussed embedding representations on the hypersphere and its connection to the hubness phenomenon. Why it matters: This highlights MBZUAI's role in hosting discussions on advanced machine learning topics like few-shot learning, which are crucial for addressing data scarcity challenges in the region and beyond.
The paper introduces TimeHUT, a new method for learning time-series representations using hierarchical uniformity-tolerance balancing of contrastive representations. TimeHUT employs a hierarchical setup to learn both instance-wise and temporal information, along with a temperature scheduler to balance uniformity and tolerance. The method was evaluated on UCR, UAE, Yahoo, and KPI datasets, demonstrating superior performance in classification tasks and competitive results in anomaly detection.
A talk introduces a computational framework for learning a compact structured representation for real-world datasets, that is both discriminative and generative. It proposes to learn a closed-loop transcription between the distribution of a high-dimensional multi-class dataset and an arrangement of multiple independent subspaces, known as a linear discriminative representation (LDR). The optimality of the closed-loop transcription can be characterized in closed-form by an information-theoretic measure known as the rate reduction. Why it matters: The framework unifies concepts and benefits of auto-encoding and GAN and generalizes them to the settings of learning a both discriminative and generative representation for multi-class visual data.
The paper introduces the Prism Hypothesis, which posits a correspondence between an encoder's feature spectrum and its functional role, with semantic encoders capturing low-frequency components and pixel encoders retaining high-frequency information. Based on this, the authors propose Unified Autoencoding (UAE), a model that harmonizes semantic structure and pixel details using a frequency-band modulator. Experiments on ImageNet and MS-COCO demonstrate that UAE effectively unifies semantic abstraction and pixel-level fidelity, achieving state-of-the-art performance.
Tailin Wu from Stanford presented research on using machine learning to accelerate scientific discovery and simulation at MBZUAI. The work covers learning theories from dynamical systems with improved accuracy and interpretability. It also introduces LAMP, a deep learning model optimizing spatial resolutions in simulations. Why it matters: Efficient AI-driven scientific simulation has broad implications for research in physics, biomedicine, materials science and engineering across the region.