Search

Results for "RegNLP"

RIRAG: Regulatory Information Retrieval and Answer Generation

arXiv · Sep 9

Researchers introduce a new task for generating question-passage pairs to aid in developing regulatory question-answering (QA) systems. The ObliQA dataset, comprising 27,869 questions from Abu Dhabi Global Markets (ADGM) financial regulations, is presented. A baseline Regulatory Information Retrieval and Answer Generation (RIRAG) system is designed and evaluated using the RePASs metric.

AraFinNLP 2024: The First Arabic Financial NLP Shared Task

arXiv · Jul 13

The AraFinNLP 2024 shared task introduced two subtasks focused on Arabic financial NLP: multi-dialect intent detection and cross-dialect translation with intent preservation. It utilized the updated ArBanking77 dataset, containing 39k parallel queries in MSA and four dialects, labeled with 77 banking-related intents. 45 teams registered, with 11 participating in intent detection (achieving a top F1 score of 0.8773) and only 1 team attempting translation (achieving a BLEU score of 1.667). Why it matters: This initiative addresses the need for specialized Arabic NLP tools in the growing Arab financial sector, promoting advancements in areas like banking chatbots and machine translation.

A Panoramic Survey of Natural Language Processing in the Arab World

arXiv · Nov 25

This survey paper reviews the landscape of Natural Language Processing (NLP) research and applications in the Arab world. It discusses the unique challenges posed by the Arabic language, such as its morphological complexity and dialectal diversity. The paper also presents a historical overview of Arabic NLP and surveys various research areas, including machine translation, sentiment analysis, and speech recognition. Why it matters: The survey provides a comprehensive resource for researchers and practitioners interested in the current state and future directions of Arabic NLP, a field critical for enabling AI technologies to serve Arabic-speaking communities.

Shorter but not Worse: Frugal Reasoning via Easy Samples as Length Regularizers in Math RLVR

arXiv · Nov 2

A new method is proposed to reduce the verbosity of LLMs in step-by-step reasoning by retaining moderately easy problems during Reinforcement Learning with Verifiable Rewards (RLVR) training. This approach acts as an implicit length regularizer, preventing the model from excessively increasing output length on harder problems. Experiments using Qwen3-4B-Thinking-2507 show the model achieves baseline accuracy with nearly twice shorter solutions.

A Case Study for Compliance as Code with Graphs and Language Models: Public release of the Regulatory Knowledge Graph

arXiv · Feb 3

This paper introduces a Regulatory Knowledge Graph (RKG) for the Abu Dhabi Global Market (ADGM) regulations, constructed using language models and graph technologies. A portion of the regulations was manually tagged to train BERT-based models, which were then applied to the rest of the corpus. The resulting knowledge graph, stored in Neo4j, and code are open-sourced on GitHub to promote advancements in compliance automation.

Addressing NLP problems in low resource settings

MBZUAI · Invalid Date

Thamar Solorio from the University of Houston will discuss machine learning approaches for spontaneous human language processing. The talk will cover adapting multilingual transformers to code-switching data and using data augmentation for domain adaptation in sequence labeling tasks. Solorio will also provide an overview of other research projects at the RiTUAL lab, focusing on the scarcity of labeled data. Why it matters: This presentation addresses key challenges in Arabic NLP related to data scarcity, which is a persistent obstacle in developing effective AI applications for the region.

User-Centric Gender Rewriting

MBZUAI · Invalid Date

NYU and NYU Abu Dhabi researchers are working on user-centric gender rewriting in NLP, especially for Arabic. They are building an Arabic Parallel Gender Corpus and developing models for gender rewriting tasks. The work aims to address representational harms caused by NLP systems that don't account for user preferences regarding grammatical gender. Why it matters: This research promotes fairness and inclusivity in Arabic NLP by enabling systems to generate gender-specific outputs based on user preferences, mitigating biases present in training data.