Skip to content
GCC AI Research

Search

Results for "Information Retrieval"

Enhanced Arabic Text Retrieval with Attentive Relevance Scoring

arXiv ·

This paper introduces an enhanced Dense Passage Retrieval (DPR) framework tailored for Arabic text retrieval. The core innovation is an Attentive Relevance Scoring (ARS) mechanism that improves semantic relevance modeling between questions and passages, replacing standard interaction methods. The method integrates pre-trained Arabic language models and architectural refinements, achieving improved retrieval and ranking accuracy for Arabic question answering. Why it matters: This work addresses the underrepresentation of Arabic in NLP research by providing a novel approach and publicly available code to improve Arabic text retrieval, which can benefit various applications like Arabic search engines and question-answering systems.

Aligning Dense Retrievers with LLM Utility via DistillationAligning Dense Retrievers with LLM Utility via Distillation

arXiv ·

Researchers proposed Utility-Aligned Embeddings (UAE), a new framework to improve dense vector retrieval for Retrieval-Augmented Generation (RAG) by aligning it with LLM utility. UAE trains a bi-encoder to imitate an LLM's utility distribution, derived from perplexity reduction, using a Utility-Modulated InfoNCE objective. On the QASPER benchmark, UAE achieved a 30.59% improvement in Recall@1 and was over 180 times faster than efficient LLM re-ranking methods while preserving competitive performance. Why it matters: This approach offers a significant leap in RAG efficiency and accuracy, providing a method to align retrieval with generative utility without test-time LLM inference, which could enable more scalable and precise LLM applications.

Retrieval Augmentation as a Shortcut to the Training Data

MBZUAI ·

This article discusses retrieval augmentation in text generation, where information retrieved from an external source is used to condition predictions. It references recent work on retrieval-augmented image captioning, showing that model size can be greatly reduced when training data is available through retrieval. The author intends to continue this work focusing on the intersection of retrieval augmentation and in-context learning, and controllable image captioning for language learning materials. Why it matters: This research direction has the potential to improve transfer learning in vision-language models, which could be especially relevant for downstream applications in Arabic NLP and multimodal tasks.