Beyond Attention: Orchid’s Adaptive Convolutions for Next-Level Sequence Modeling
MBZUAI · Notable
Summary
A new neural network architecture called Orchid was introduced that uses adaptive convolutions to achieve quasilinear computational complexity O(N logN) for sequence modeling. Orchid adapts its convolution kernel dynamically based on the input sequence. Evaluations across language modeling and image classification show that Orchid outperforms attention-based architectures like BERT and Vision Transformers, often with smaller model sizes. Why it matters: Orchid extends the feasible sequence length beyond the practical limits of dense attention layers, representing progress toward more efficient and scalable deep learning models.
Keywords
Orchid · convolution · sequence modeling · BERT · Transformer
Get the weekly digest
Top AI stories from the GCC region, every week.