Search

Results for "QLoRA"

SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression

arXiv · Jun 5

The paper introduces Sparse-Quantized Representation (SpQR), a new compression format and quantization technique for large language models (LLMs). SpQR identifies outlier weights and stores them in higher precision while compressing the remaining weights to 3-4 bits. The method achieves less than 1% accuracy loss in perplexity for LLaMA and Falcon LLMs and enables a 33B parameter LLM to run on a single 24GB consumer GPU. Why it matters: This enables near-lossless compression of LLMs, making powerful models accessible on resource-constrained devices and accelerating inference without significant accuracy degradation.

Resource-Aware Arabic LLM Creation: Model Adaptation, Integration, and Multi-Domain Testing

arXiv · Dec 23

Researchers fine-tuned the Qwen2-1.5B model for Arabic using QLoRA on a 4GB VRAM system, using datasets like Bactrian and Arabic Wikipedia. They addressed challenges in Arabic NLP including morphology and dialectal variations. After 10,000 training steps, the final loss converged to 0.1083 with improved handling of Arabic-specific linguistic phenomena. Why it matters: This demonstrates a resource-efficient approach for creating specialized Arabic language models, democratizing access to advanced NLP technologies.

QU-NLP at QIAS 2025 Shared Task: A Two-Phase LLM Fine-Tuning and Retrieval-Augmented Generation Approach for Islamic Inheritance Reasoning

arXiv · Aug 20

The QU-NLP team presented their approach to the QIAS 2025 shared task on Islamic Inheritance Reasoning, fine-tuning the Fanar-1-9B model using LoRA and integrating it into a RAG pipeline. Their system achieved an accuracy of 0.858 on the final test, outperforming models like GPT 4.5, LLaMA, and Mistral in zero-shot settings. The system particularly excelled in advanced reasoning, achieving 97.6% accuracy. Why it matters: This demonstrates the effectiveness of domain-specific fine-tuning and retrieval augmentation for Arabic LLMs in complex reasoning tasks, even surpassing frontier models.

RightNow-Arabic-0.5B-Turbo: An Open Sub-1B Arabic Language Model via Vocabulary Injection and Edge-First Deployment

arXiv · Apr 10

RightNow-Arabic-0.5B-Turbo is a new 518M-parameter Arabic-specialized decoder LLM, built on Qwen2.5-0.5B, designed to bridge the gap between small multilingual and large Arabic-specialized models. Its development pipeline included adding 27,032 Arabic tokens via vocabulary injection, continued pretraining on 504M Arabic tokens, and fine-tuning with supervised instruction and direct preference optimization. The model achieved a 35.9% mean accuracy on three Arabic benchmarks (COPA-ar, Arabic HellaSwag, ArabicMMLU), outperforming all same-class open models and recovering 67% of SILMA-9B's mean accuracy at 1/18 the parameters, with all code and weights publicly released. Why it matters: This model significantly advances efficient Arabic NLP by providing a powerful, specialized sub-1B LLM suitable for edge deployment, making advanced Arabic AI more accessible and performant on resource-constrained devices.