Skip to content
GCC AI Research

Search

Results for "focal modulation"

Making sense of space and time in video

MBZUAI ·

MBZUAI researchers presented a new approach to video analysis at ICCV in Paris, led by Syed Talal Wasim. The approach builds on still image processing techniques like focal modulation to analyze spatial and temporal information in video separately. It aims to improve temporal aggregation while avoiding the computational complexity of transformers. Why it matters: This research advances video understanding in computer vision by offering a more efficient method for temporal modeling, crucial for applications like activity recognition and video surveillance.

The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding

arXiv ·

The paper introduces the Prism Hypothesis, which posits a correspondence between an encoder's feature spectrum and its functional role, with semantic encoders capturing low-frequency components and pixel encoders retaining high-frequency information. Based on this, the authors propose Unified Autoencoding (UAE), a model that harmonizes semantic structure and pixel details using a frequency-band modulator. Experiments on ImageNet and MS-COCO demonstrate that UAE effectively unifies semantic abstraction and pixel-level fidelity, achieving state-of-the-art performance.

Fernando Albarracin, Young Scientist Award, URSI GASS 2020

TII ·

Dr. Fernando Albarracin from the Technology Innovation Institute has presented a novel microwave applicator design for hyperthermia, potentially useful in cancer treatment. The design combines two flat dielectric graded-index (GRIN) lenses to localize electromagnetic energy within a specific spot in the tissue. This system offers a suitable alternative to conventional antenna-based applicators by considering the interface between free space and human tissue. Why it matters: This research introduces a new approach to hyperthermia treatment that could improve the precision and effectiveness of cancer therapy in the region.

FissionFusion: Fast Geometric Generation and Hierarchical Souping for Medical Image Analysis

arXiv ·

Researchers at MBZUAI introduce FissionFusion, a hierarchical model merging approach to improve medical image analysis performance. The method uses local and global aggregation of models based on hyperparameter configurations, along with a cyclical learning rate scheduler for efficient model generation. Experiments show FissionFusion outperforms standard model souping by approximately 6% on HAM10000 and CheXpert datasets and improves OOD performance.

Unlocking the Potential of Large Models for Vision Related Tasks

MBZUAI ·

Yanwei Fu from Fudan University will present research on multimodal models, robotic grasping, and fMRI neural decoding. Topics include few-shot learning, object-centered self-supervised learning, image manipulation, and visual-language alignment. The research also covers Transformer compression and applications of large models with MVS 3D modeling in robotic arm grasping. Why it matters: While the talk is not directly about Middle East AI, the topics covered are core to advancing AI research and applications in the region.