Search

Beyond Attention: Orchid’s Adaptive Convolutions for Next-Level Sequence Modeling

MBZUAI · Invalid Date

A new neural network architecture called Orchid was introduced that uses adaptive convolutions to achieve quasilinear computational complexity O(N logN) for sequence modeling. Orchid adapts its convolution kernel dynamically based on the input sequence. Evaluations across language modeling and image classification show that Orchid outperforms attention-based architectures like BERT and Vision Transformers, often with smaller model sizes. Why it matters: Orchid extends the feasible sequence length beyond the practical limits of dense attention layers, representing progress toward more efficient and scalable deep learning models.

Results for "attention maps"

Beyond Attention: Orchid’s Adaptive Convolutions for Next-Level Sequence Modeling