The study analyzes over 1,000 images generated by ImageFX, DALL-E V3, and Grok for 56 Saudi professions, finding significant gender imbalances and cultural inaccuracies. DALL-E V3 exhibited the strongest gender stereotyping, with 96% male depictions, particularly in leadership and technical roles. The research underscores the need for diverse training data and culturally sensitive evaluation to ensure equitable AI outputs that accurately reflect Saudi Arabia's labor market and culture.
KAUST computer scientist Mohamed Elhoseiny and his VISION CAIR team developed Creative Walk Adversarial Networks (CWAN) for novel art generation. CWAN learns from existing art styles and deviates using 'random walk deviation' methods. Human evaluators preferred CWAN-generated art compared to other methods like StyleGAN2. Why it matters: The research demonstrates AI's potential as a valuable tool for artists, enabling the creation of unique and meaningful art, and explores more effective emotional language in image captioning.
Researchers from Carnegie Mellon University and MBZUAI have developed a new method called ConceptAligner for precise image editing using AI. The system decomposes text embeddings into independent building blocks called atomic concepts, allowing users to make targeted tweaks without generating entirely new images. Their approach ensures that each latent factor maps to a specific user-controllable dial, enabling accurate concept-level modifications. Why it matters: This research addresses a major limitation in AI image generation, enhancing its usefulness in industries where precise control is crucial, such as advertising and medicine, and improving the reliability of AI-driven creative tools.
MBZUAI researchers are working to improve computer vision models by incorporating common sense knowledge. They aim to address issues like the generation of unrealistic human features, such as hands with incorrect numbers of fingers. By integrating common-sense knowledge, like the fact that humans typically have five fingers per hand, they seek to make deep learning models more reliable. Why it matters: This research could improve the accuracy and trustworthiness of AI-generated content, making it more suitable for real-world applications.
Nicu Sebe from the University of Trento presented recent work on video generation, focusing on animating objects in a source image using external information like labels, driving videos, or text. He introduced a Learnable Game Engine (LGE) trained from monocular annotated videos, which maintains states of scenes, objects, and agents to render controllable viewpoints. Why it matters: This talk highlights advancements in cross-modal AI, potentially enabling new applications in gaming, simulation, and content creation within the region.