Skip to content
GCC AI Research

Search

Results for "temporal framing"

Uncovering Temporal Framing in the News

arXiv ·

Researchers from MBZUAI have proposed a new taxonomy of eight temporal frames and studied their persuasive use in news discourse. They created a multilingual dataset by expertly annotating 458 English and German news articles, identifying over 2,000 temporally framed sentences and approximately 3,000 annotations. Their experiments demonstrated that temporal framing is learnable at the sentence level, with supervised models significantly outperforming zero-shot classification approaches. Why it matters: This research provides a valuable dataset and methodology for understanding how time-related language shapes interpretation in news, contributing to advancements in NLP for media analysis and potentially countering disinformation.

Making sense of space and time in video

MBZUAI ·

MBZUAI researchers presented a new approach to video analysis at ICCV in Paris, led by Syed Talal Wasim. The approach builds on still image processing techniques like focal modulation to analyze spatial and temporal information in video separately. It aims to improve temporal aggregation while avoiding the computational complexity of transformers. Why it matters: This research advances video understanding in computer vision by offering a more efficient method for temporal modeling, crucial for applications like activity recognition and video surveillance.

FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance

arXiv ·

FancyVideo, a new video generator, introduces a Cross-frame Textual Guidance Module (CTGM) to enhance text-to-video models. CTGM uses a Temporal Information Injector and Temporal Affinity Refiner to achieve frame-specific textual guidance, improving comprehension of temporal logic. Experiments on the EvalCrafter benchmark demonstrate FancyVideo's state-of-the-art performance in generating dynamic and consistent videos, also supporting image-to-video tasks.

Modeling Complex Object Changes in Satellite Image Time-Series: Approach based on CSP and Spatiotemporal Graph

arXiv ·

This paper introduces a novel approach for monitoring and analyzing the evolution of complex geographic objects in satellite image time-series. The method uses a spatiotemporal graph and constraint satisfaction problems (CSP) to model and analyze object changes. Experiments on real-world satellite images from Saudi Arabian cities demonstrate the effectiveness of the proposed approach.

Multimodal Factual Knowledge Acquisition

MBZUAI ·

Manling Li from UIUC proposes a new research direction: Event-Centric Multimodal Knowledge Acquisition, which transforms traditional entity-centric single-modal knowledge into event-centric multi-modal knowledge. The approach addresses challenges in understanding multimodal semantic structures using zero-shot cross-modal transfer (CLIP-Event) and long-horizon temporal dynamics through the Event Graph Model. Li's work aims to enable machines to capture complex timelines and relationships, with applications in timeline generation, meeting summarization, and question answering. Why it matters: This research pioneers a new approach to multimodal information extraction, moving from static entity-based understanding to dynamic, event-centric knowledge acquisition, which is essential for advanced AI applications in understanding complex scenarios.

A Novel CNN-LSTM-based Approach to Predict Urban Expansion

arXiv ·

This paper introduces a novel two-step method for predicting urban expansion using time-series satellite imagery. The approach combines semantic image segmentation with a CNN-LSTM model to learn temporal features. Experiments on satellite images from Riyadh, Jeddah, and Dammam in Saudi Arabia demonstrate improved performance compared to existing methods based on Mean Square Error, Root Mean Square Error, Peak Signal to Noise Ratio, Structural Similarity Index, and overall classification accuracy.

VideoMolmo: Spatio-Temporal Grounding Meets Pointing

arXiv ·

Researchers from MBZUAI have introduced VideoMolmo, a large multimodal model for spatio-temporal pointing conditioned on textual descriptions. The model incorporates a temporal module with an attention mechanism and a temporal mask fusion pipeline using SAM2 for improved coherence across video sequences. They also curated a dataset of 72k video-caption pairs and introduced VPoS-Bench, a benchmark for evaluating generalization across real-world scenarios, with code and models publicly available.