Search

Results for "parameter-efficient learning"

Parameter-Efficient Fine-Tuning for NLP Models

MBZUAI · Invalid Date

The article discusses parameter-efficient fine-tuning methods for large NLP models, highlighting their importance due to the increasing size and computational demands of state-of-the-art language models. It provides an overview of these methods, presenting them in a unified view to emphasize their similarities and differences. Indraneil, a PhD candidate at TU Darmstadt's UKP Lab, is researching parameter-efficient fine-tuning, sparsity, and conditional computation methods to improve LLM performance in multilingual, multi-task settings. Why it matters: Efficient fine-tuning techniques are crucial for democratizing access to and accelerating the deployment of large language models in the region and beyond.

Adapting AI to identify Arabic dialects

KAUST · Aug 29

KAUST researchers have developed a parameter-efficient learning approach to identify Arabic dialects using limited data and computing power, fine-tuning the Whisper model with a dataset of 17 dialects. The model achieves high accuracy using only 2.5% of the parameters of the larger model and 30% of the training data. Srijith Radhakrishnan presented the findings at EMNLP 2023 and Interspeech 2023. Why it matters: This research addresses the challenge of dialect identification in Arabic NLP and enables more efficient use of large language models in resource-constrained environments.

SALT: Parameter-Efficient Fine-Tuning via Singular Value Adaptation with Low-Rank Transformation

arXiv · Mar 20

Researchers introduce SALT, a parameter-efficient fine-tuning method for medical image segmentation that combines singular value adaptation with low-rank transformation. SALT selectively adapts influential singular values and complements this with a low-rank update for the remaining subspace. Experiments on five medical datasets show SALT outperforms state-of-the-art PEFT methods by 2-5% in Dice score with only 3.9% trainable parameters.

Training Deep Neural Networks in Tiny Subspaces

MBZUAI · Invalid Date

Xiaolin Huang from Shanghai Jiao Tong University presented a talk at MBZUAI on training deep neural networks in tiny subspaces. The talk covered the low-dimension hypothesis in neural networks and methods to find subspaces for efficient training. It suggests that training in smaller subspaces can improve training efficiency, generalization, and robustness. Why it matters: Investigating efficient training methods is crucial for resource-constrained environments and can enable broader access to advanced AI.

Green Learning — New Generation Machine Learning and Applications

MBZUAI · Invalid Date

A recent talk at MBZUAI discussed "Green Learning" and Operational Neural Networks (ONNs) as efficient alternatives to CNNs. ONNs use "nodal" and "pool" operators and "generative neurons" to expand neuron learning capacity. Moncef Gabbouj from Tampere University presented Self-Organized ONNs (Self-ONNs) and their signal processing applications. Why it matters: Exploring more efficient AI models is crucial for sustainable development of AI in the region, as it addresses computational resource constraints and promotes broader accessibility.

Distillation Policy Optimization

arXiv · Feb 1

The paper introduces a novel actor-critic framework called Distillation Policy Optimization that combines on-policy and off-policy data for reinforcement learning. It incorporates variance reduction mechanisms like a unified advantage estimator (UAE) and a residual baseline. The empirical results demonstrate improved sample efficiency for on-policy algorithms, bridging the gap with off-policy methods.

Beyond Attention: Orchid’s Adaptive Convolutions for Next-Level Sequence Modeling

MBZUAI · Invalid Date

A new neural network architecture called Orchid was introduced that uses adaptive convolutions to achieve quasilinear computational complexity O(N logN) for sequence modeling. Orchid adapts its convolution kernel dynamically based on the input sequence. Evaluations across language modeling and image classification show that Orchid outperforms attention-based architectures like BERT and Vision Transformers, often with smaller model sizes. Why it matters: Orchid extends the feasible sequence length beyond the practical limits of dense attention layers, representing progress toward more efficient and scalable deep learning models.