Skip to content
GCC AI Research

Search

Results for "long-context modeling"

AraModernBERT: Transtokenized Initialization and Long-Context Encoder Modeling for Arabic

arXiv ·

The paper introduces AraModernBERT, an adaptation of the ModernBERT encoder architecture for Arabic, focusing on transtokenized embedding initialization and long-context modeling up to 8,192 tokens. Transtokenization is shown to be crucial for Arabic language modeling, significantly enhancing masked language modeling performance. The model demonstrates stable and effective long-context modeling, improving intrinsic language modeling performance at extended sequence lengths. Why it matters: This research provides practical insights for adapting modern encoder architectures to Arabic and other languages using Arabic-derived scripts, advancing Arabic NLP.

Modeling Text as a Living Object

MBZUAI ·

The InterText project, funded by the European Research Council, aims to advance NLP by developing a framework for modeling fine-grained relationships between texts. This approach enables tracing the origin and evolution of texts and ideas. Iryna Gurevych from the Technical University of Darmstadt presented the intertextual approach to NLP, covering data modeling, representation learning, and practical applications. Why it matters: This research could enable a new generation of AI applications for text work and critical reading, with potential applications in collaborative knowledge construction and document revision assistance.

NLP for Long, Structured Documents

MBZUAI ·

Jan Buchmann from TU Darmstadt presented research on NLP for long, structured documents at MBZUAI. The research addresses gaps in using document structure and improving the verifiability of LM responses. Experiments showed that models learn to represent document structure during pre-training, and larger models can cite sources well. Why it matters: This research contributes to making NLP more effective for complex documents like scientific articles and legal texts, which is crucial for information accessibility.

Beyond Attention: Orchid’s Adaptive Convolutions for Next-Level Sequence Modeling

MBZUAI ·

A new neural network architecture called Orchid was introduced that uses adaptive convolutions to achieve quasilinear computational complexity O(N logN) for sequence modeling. Orchid adapts its convolution kernel dynamically based on the input sequence. Evaluations across language modeling and image classification show that Orchid outperforms attention-based architectures like BERT and Vision Transformers, often with smaller model sizes. Why it matters: Orchid extends the feasible sequence length beyond the practical limits of dense attention layers, representing progress toward more efficient and scalable deep learning models.

A Benchmark and Agentic Framework for Omni-Modal Reasoning and Tool Use in Long Videos

arXiv ·

A new benchmark, LongShOTBench, is introduced for evaluating multimodal reasoning and tool use in long videos, featuring open-ended questions and diagnostic rubrics. The benchmark addresses the limitations of existing datasets by combining temporal length and multimodal richness, using human-validated samples. LongShOTAgent, an agentic system, is also presented for analyzing long videos, with both the benchmark and agent demonstrating the challenges faced by state-of-the-art MLLMs.

Self-supervised DNA models and scalable sequence processing with memory augmented transformers

MBZUAI ·

Dr. Mikhail Burtsev of the London Institute presented research on GENA-LM, a suite of transformer-based DNA language models. The talk addressed the challenge of scaling transformers for genomic sequences, proposing recurrent memory augmentation to handle long input sequences efficiently. This approach improves language modeling performance and holds promise for memory-intensive applications in bioinformatics. Why it matters: This research can significantly advance AI's capabilities in genomics by enabling the processing of much larger DNA sequences, with potential breakthroughs in understanding and treating diseases.