MBZUAI M.Sc. graduate Mohammad Hanan Ghani developed new techniques to improve text-to-image generation from long text prompts, combining large language models and diffusion models. Advised by Dr. Salman Khan, Ghani published three papers at ICLR, BMVC, and NeurIPS, with the ICLR paper focusing on generating images that accurately reflect detailed text descriptions. The new system improves upon existing techniques to generate images that closely follow the details of the input text. Why it matters: This research addresses a key limitation in current T2I models and advances the field of multimodal AI, potentially improving the capabilities of robots and autonomous devices.
The article discusses research on fine-tuning text-to-image diffusion models, including reward function training, online reinforcement learning (RL) fine-tuning, and addressing reward over-optimization. A Text-Image Alignment Assessment (TIA2) benchmark is introduced to study reward over-optimization. TextNorm, a method for confidence calibration in reward models, is presented to reduce over-optimization risks. Why it matters: Improving the alignment and fidelity of text-to-image models is crucial for generating high-quality content, and addressing over-optimization enhances the reliability of these models in creative applications.
Researchers from Carnegie Mellon University and MBZUAI have developed a new method called ConceptAligner for precise image editing using AI. The system decomposes text embeddings into independent building blocks called atomic concepts, allowing users to make targeted tweaks without generating entirely new images. Their approach ensures that each latent factor maps to a specific user-controllable dial, enabling accurate concept-level modifications. Why it matters: This research addresses a major limitation in AI image generation, enhancing its usefulness in industries where precise control is crucial, such as advertising and medicine, and improving the reliability of AI-driven creative tools.
MBZUAI graduate Gokul Karthik Kumar credits the university's interdisciplinary approach for helping him discover his passion for working at the intersection of computer vision and NLP. He chose MBZUAI over the University of Waterloo due to the flexibility to explore different AI domains. Kumar will join G42’s Inception Institute of Artificial Intelligence (IIAI) as an applied scientist to develop large language models tailored for UAE-focused applications. Why it matters: This highlights MBZUAI's role in nurturing AI talent for the UAE and G42's focus on developing local LLMs.
Researchers at MBZUAI have introduced TiBiX, a novel approach leveraging temporal information from previous chest X-rays (CXRs) and reports for bidirectional generation of current CXRs and reports. TiBiX addresses two key challenges: generating current images from previous images and reports, and generating current reports from both previous and current images. The study also introduces a curated temporal benchmark dataset derived from the MIMIC-CXR dataset and achieves state-of-the-art results in report generation.