Skip to content
GCC AI Research

Search

Results for "LSTM"

A Novel CNN-LSTM-based Approach to Predict Urban Expansion

arXiv ·

This paper introduces a novel two-step method for predicting urban expansion using time-series satellite imagery. The approach combines semantic image segmentation with a CNN-LSTM model to learn temporal features. Experiments on satellite images from Riyadh, Jeddah, and Dammam in Saudi Arabia demonstrate improved performance compared to existing methods based on Mean Square Error, Root Mean Square Error, Peak Signal to Noise Ratio, Structural Similarity Index, and overall classification accuracy.

Wind Speed Forecasting Based on Data Decomposition and Deep Learning Models: A Case Study of a Wind Farm in Saudi Arabia

arXiv ·

A novel wind speed forecasting (WSF) framework is proposed combining Wavelet Packet Decomposition (WPD), Seasonal Adjustment Method (SAM), and Bidirectional Long Short-term Memory (BiLSTM). The SAM method eliminates the seasonal component of the decomposed subseries generated by WPD to reduce forecasting complexity. The model was tested on five years of hourly wind speed observations acquired from the Dumat Al-Jandal wind farm in Al-Jouf, Saudi Arabia, achieving high forecasting accuracy.

Learning Time-Series Representations by Hierarchical Uniformity-Tolerance Latent Balancing

arXiv ·

The paper introduces TimeHUT, a new method for learning time-series representations using hierarchical uniformity-tolerance balancing of contrastive representations. TimeHUT employs a hierarchical setup to learn both instance-wise and temporal information, along with a temperature scheduler to balance uniformity and tolerance. The method was evaluated on UCR, UAE, Yahoo, and KPI datasets, demonstrating superior performance in classification tasks and competitive results in anomaly detection.

Self-supervised DNA models and scalable sequence processing with memory augmented transformers

MBZUAI ·

Dr. Mikhail Burtsev of the London Institute presented research on GENA-LM, a suite of transformer-based DNA language models. The talk addressed the challenge of scaling transformers for genomic sequences, proposing recurrent memory augmentation to handle long input sequences efficiently. This approach improves language modeling performance and holds promise for memory-intensive applications in bioinformatics. Why it matters: This research can significantly advance AI's capabilities in genomics by enabling the processing of much larger DNA sequences, with potential breakthroughs in understanding and treating diseases.

Overcoming the ‘reversal curse’ in LLMs with ReCall

MBZUAI ·

MBZUAI researchers identified 'self-referencing causal cycles' in LLM training data that can mitigate the 'reversal curse,' where LLMs struggle with information presented in reverse order. The study, to be presented at ACL, explains that the transformer architecture's unidirectional token generation causes this issue. By leveraging the repetitive nature of information in training texts, the team developed an efficient solution to improve LLM performance. Why it matters: Overcoming the reversal curse can significantly enhance LLM accuracy and reliability, especially in tasks requiring bidirectional reasoning and understanding of context.

What Really Counts: Theoretical and Empirical Aspects of Counting Behaviour in Simple RNNs

MBZUAI ·

Nadine El Naggar from City, University of London presented research on RNN learning of counting behavior, formalizing it as Dyck-1 acceptance. Empirically, RNN models struggle to learn exact counting and fail on longer sequences, even when weights are correctly initialized. Theoretically, Counter Indicator Conditions (CICs) were proposed and proven necessary/sufficient for exact counting in single-cell RNNs, but experiments show these CICs are not found or are unlearned during training. Why it matters: This work highlights challenges in RNNs learning systematic tasks, suggesting gradient descent-based optimization may not achieve exact counting behavior with standard setups.

A Benchmark and Agentic Framework for Omni-Modal Reasoning and Tool Use in Long Videos

arXiv ·

A new benchmark, LongShOTBench, is introduced for evaluating multimodal reasoning and tool use in long videos, featuring open-ended questions and diagnostic rubrics. The benchmark addresses the limitations of existing datasets by combining temporal length and multimodal richness, using human-validated samples. LongShOTAgent, an agentic system, is also presented for analyzing long videos, with both the benchmark and agent demonstrating the challenges faced by state-of-the-art MLLMs.