KAUST's Image and Video Understanding Lab is developing machine learning algorithms for computer vision and object tracking, with applications in video content search and UAV navigation. Their algorithms can detect specific activities in videos, helping platforms detect unwanted content and deliver relevant ads. The object tracking algorithm is also used to empower UAVs, enabling them to follow objects autonomously. Why it matters: This research enhances video content analysis and UAV capabilities, positioning KAUST as a leader in computer vision and AI applications within the region.
Researchers at MBZUAI have introduced a novel approach to enhance Large Multimodal Models (LMMs) for autonomous driving by integrating 3D tracking information. This method uses a track encoder to embed spatial and temporal data, enriching visual queries and improving the LMM's understanding of driving scenarios. Experiments on DriveLM-nuScenes and DriveLM-CARLA benchmarks demonstrate significant improvements in perception, planning, and prediction tasks compared to baseline models.
This paper presents a decentralized multi-agent unmanned aerial system designed for search, pickup, and relocation of objects. The system integrates multi-agent aerial exploration, object detection/tracking, and aerial gripping. The decentralized system uses global state estimation, reactive collision avoidance, and sweep planning for exploration. Why it matters: The system's successful deployment in demonstrations and competitions like MBZIRC highlights the potential of integrated robotic solutions for complex tasks such as search and rescue in the region.
This paper introduces a novel approach for monitoring and analyzing the evolution of complex geographic objects in satellite image time-series. The method uses a spatiotemporal graph and constraint satisfaction problems (CSP) to model and analyze object changes. Experiments on real-world satellite images from Saudi Arabian cities demonstrate the effectiveness of the proposed approach.
Researchers at the University of Maryland have developed an AI system that can identify objects hidden by camouflage. The AI uses a convolutional neural network trained on synthetic data to detect partially occluded objects. The system outperformed existing object detection methods in tests on real-world images. Why it matters: The work demonstrates potential applications of AI in defense, security, and search and rescue operations in the Middle East and elsewhere.
A proposed recognition system aims to identify missing persons, deceased individuals, and lost objects during the Hajj and Umrah pilgrimages in Saudi Arabia. The system intends to leverage facial recognition and object identification to manage the large crowds expected in the coming decade, estimated to reach 20 million pilgrims. It will be integrated into the CrowdSensing system for crowd estimation, management, and safety.
MBZUAI researchers introduce PG-Video-LLaVA, a large multimodal model with pixel-level grounding capabilities for videos, integrating audio cues for enhanced understanding. The model uses an off-the-shelf tracker and grounding module to localize objects in videos based on user prompts. PG-Video-LLaVA is evaluated on video question-answering and grounding benchmarks, using Vicuna instead of GPT-3.5 for reproducibility.
MBZUAI researchers are presenting a new approach to open-world object detection at the AAAI conference. The method enables machines to distinguish between known and unknown objects in images, and then learn to classify the unknown objects. PhD student Sahal Shaji Mullappilly is the lead author of the study, titled "Semi-Supervised Open-World Detection". Why it matters: This research addresses a key limitation in current object detection systems, allowing for more adaptable and robust AI in real-world applications.