MBZUAI Professor Fahad Khan is working on a unified theory of machine visual intelligence. His goal is to enable AI systems to better understand and function in complex, chaotic visual environments. The aim is to improve real-world applications like smart cities, personalized healthcare, and autonomous vehicles. Why it matters: This research could significantly advance AI's ability to perceive and interact with the real world, especially in challenging environments common in the developing world.
The paper introduces the Prism Hypothesis, which posits a correspondence between an encoder's feature spectrum and its functional role, with semantic encoders capturing low-frequency components and pixel encoders retaining high-frequency information. Based on this, the authors propose Unified Autoencoding (UAE), a model that harmonizes semantic structure and pixel details using a frequency-band modulator. Experiments on ImageNet and MS-COCO demonstrate that UAE effectively unifies semantic abstraction and pixel-level fidelity, achieving state-of-the-art performance.
Margaret Livingstone, a neurobiology professor at Harvard Medical School, lectured at KAUST's Winter Enrichment Program 2018 on how art can reveal insights into the human brain. She discussed how artists have long understood the independent roles of color and luminance in visual perception. Livingstone highlighted examples from Picasso, Monet, and Warhol to illustrate how artists manipulate visual cues. Why it matters: This interdisciplinary approach can potentially lead to new understandings of how the brain processes visual information and inform advances in both neuroscience and art.
This seminar explores vision systems through self-supervised representation learning, addressing challenges and solutions in mainstream vision self-supervised learning methods. It discusses developing versatile representations across modalities, tasks, and architectures to propel the evolution of the vision foundation model. Tong Zhang from EPFL, with a background from Beihang University, New York University, and Australian National University, will lead the talk. Why it matters: Advancing vision foundation models is crucial for expanding AI applications, especially in the Middle East where computer vision can address challenges in areas like urban planning, agriculture, and environmental monitoring.
KAUST's Visual Computing Center (VCC) is researching computer vision, image processing, and machine learning, with applications in self-driving cars, surveillance, and security. Professor Bernard Ghanem is working on teaching machines to understand visual data semantically, similar to how humans perceive the world. Self-driving cars use visual sensors to interpret traffic signals and detect obstacles, while computer vision also assists governments and corporations with security applications like facial recognition and detecting unattended luggage. Why it matters: Advancements in computer vision at KAUST can contribute to innovations in autonomous vehicles and enhance security measures in the region.