Skip to content
GCC AI Research

Topics

Edge Computing

1 article RSS ↗

A compact multimodal model for real-time video understanding on edge devices

MBZUAI · · CV Research

MBZUAI researchers developed Mobile-VideoGPT, a compact and efficient multimodal model for real-time video understanding on edge devices. The system uses keyframe selection, efficient token projection, and a Qwen-2.5-0.5B language model. Testing showed that Mobile-VideoGPT is faster and performs better than other models while being significantly smaller, and the model and code are publicly available. Why it matters: This research enables on-device AI processing for video, reducing reliance on remote servers and addressing privacy concerns, which can accelerate the adoption of AI in mobile and embedded applications.