Skip to content
GCC AI Research

From Text to image: M.Sc. graduate develops cutting-edge techniques to transform T2I generation

MBZUAI · Significant research

Summary

MBZUAI M.Sc. graduate Mohammad Hanan Ghani developed new techniques to improve text-to-image generation from long text prompts, combining large language models and diffusion models. Advised by Dr. Salman Khan, Ghani published three papers at ICLR, BMVC, and NeurIPS, with the ICLR paper focusing on generating images that accurately reflect detailed text descriptions. The new system improves upon existing techniques to generate images that closely follow the details of the input text. Why it matters: This research addresses a key limitation in current T2I models and advances the field of multimodal AI, potentially improving the capabilities of robots and autonomous devices.

Get the weekly digest

Top AI stories from the GCC region, every week.

Related

Fine-tuning Text-to-Image Models: Reinforcement Learning and Reward Over-Optimization

MBZUAI ·

The article discusses research on fine-tuning text-to-image diffusion models, including reward function training, online reinforcement learning (RL) fine-tuning, and addressing reward over-optimization. A Text-Image Alignment Assessment (TIA2) benchmark is introduced to study reward over-optimization. TextNorm, a method for confidence calibration in reward models, is presented to reduce over-optimization risks. Why it matters: Improving the alignment and fidelity of text-to-image models is crucial for generating high-quality content, and addressing over-optimization enhances the reliability of these models in creative applications.

Alumni Spotlight: Aspiration rooted in Research

MBZUAI ·

MBZUAI alumnus Hanan Gani, a 2024 master's graduate in machine learning, is now a research associate at MBZUAI working on a meteorological project with the UAE government. He also focuses on multimodal and embodied intelligence research, mentors AI students, and has published nine papers during his time at MBZUAI. His research includes work on vision transformers, text-to-image generation, and large multimodal models. Why it matters: Showcases MBZUAI's role in attracting and developing AI talent within the UAE, contributing to the nation's AI research capabilities.

Create and edit images like a smart artist

MBZUAI ·

Researchers from Carnegie Mellon University and MBZUAI have developed a new method called ConceptAligner for precise image editing using AI. The system decomposes text embeddings into independent building blocks called atomic concepts, allowing users to make targeted tweaks without generating entirely new images. Their approach ensures that each latent factor maps to a specific user-controllable dial, enabling accurate concept-level modifications. Why it matters: This research addresses a major limitation in AI image generation, enhancing its usefulness in industries where precise control is crucial, such as advertising and medicine, and improving the reliability of AI-driven creative tools.

FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance

arXiv ·

FancyVideo, a new video generator, introduces a Cross-frame Textual Guidance Module (CTGM) to enhance text-to-video models. CTGM uses a Temporal Information Injector and Temporal Affinity Refiner to achieve frame-specific textual guidance, improving comprehension of temporal logic. Experiments on the EvalCrafter benchmark demonstrate FancyVideo's state-of-the-art performance in generating dynamic and consistent videos, also supporting image-to-video tasks.