Skip to content
GCC AI Research

Search

Results for "model architecture"

Beyond Attention: Orchid’s Adaptive Convolutions for Next-Level Sequence Modeling

MBZUAI ·

A new neural network architecture called Orchid was introduced that uses adaptive convolutions to achieve quasilinear computational complexity O(N logN) for sequence modeling. Orchid adapts its convolution kernel dynamically based on the input sequence. Evaluations across language modeling and image classification show that Orchid outperforms attention-based architectures like BERT and Vision Transformers, often with smaller model sizes. Why it matters: Orchid extends the feasible sequence length beyond the practical limits of dense attention layers, representing progress toward more efficient and scalable deep learning models.

The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding

arXiv ·

The paper introduces the Prism Hypothesis, which posits a correspondence between an encoder's feature spectrum and its functional role, with semantic encoders capturing low-frequency components and pixel encoders retaining high-frequency information. Based on this, the authors propose Unified Autoencoding (UAE), a model that harmonizes semantic structure and pixel details using a frequency-band modulator. Experiments on ImageNet and MS-COCO demonstrate that UAE effectively unifies semantic abstraction and pixel-level fidelity, achieving state-of-the-art performance.

Making computer vision more efficient with state-space models

MBZUAI ·

MBZUAI researchers developed GroupMamba, a new set of state-space models (SSMs) for computer vision that addresses limitations in existing SSMs related to computational efficiency and optimization challenges. GroupMamba introduces a new layer called modulated group mamba, improving efficiency and stability. In benchmark tests, GroupMamba performed as well as similar SSM systems, but more efficiently, offering a backbone for tasks like image classification, object detection, and segmentation. Why it matters: This research aims to bridge the gap between vision transformers and CNNs by improving SSMs, potentially leading to more efficient and powerful computer vision models.

Duet: efficient and scalable hybriD neUral rElation undersTanding

arXiv ·

The paper introduces Duet, a hybrid neural relation understanding method for cardinality estimation. Duet addresses limitations of existing learned methods, such as high costs and scalability issues, by incorporating predicate information into an autoregressive model. Experiments demonstrate Duet's efficiency, accuracy, and scalability, even outperforming GPU-based methods on CPU.

LLMs 101: Large language models explained

MBZUAI ·

The article provides a basic overview of large language models (LLMs), explaining their functionality and applications. LLMs are AI systems that process and generate human-like text using transformer architecture, trained on vast datasets to predict the next word in a sequence. The piece differentiates between general-purpose, task-specific, and multimodal models, as well as closed-source and open-source LLMs. Why it matters: LLMs are foundational for advancements in Arabic NLP, as evidenced by models like MBZUAI's Jais, and understanding their mechanics is crucial for regional AI development.

QCRI Machine Translation Systems for IWSLT 16

arXiv ·

This paper describes QCRI's machine translation systems for the IWSLT 2016 evaluation campaign, focusing on Arabic-English and English-Arabic tracks. They built both Phrase-based and Neural machine translation models. A Neural MT system, trained by stacking data from different genres through fine-tuning, and applying ensemble over 8 models, outperformed a strong phrase-based system by 2 BLEU points in the Arabic->English direction. Why it matters: The research highlights the early promise of neural machine translation for Arabic language pairs, demonstrating its potential to surpass traditional methods.