The AraFinNLP 2024 shared task introduced two subtasks focused on Arabic financial NLP: multi-dialect intent detection and cross-dialect translation with intent preservation. It utilized the updated ArBanking77 dataset, containing 39k parallel queries in MSA and four dialects, labeled with 77 banking-related intents. 45 teams registered, with 11 participating in intent detection (achieving a top F1 score of 0.8773) and only 1 team attempting translation (achieving a BLEU score of 1.667). Why it matters: This initiative addresses the need for specialized Arabic NLP tools in the growing Arab financial sector, promoting advancements in areas like banking chatbots and machine translation.
MBZUAI researchers introduce FAID, a fine-grained AI-generated text detection framework capable of classifying text as human-written, LLM-generated, or collaboratively written. FAID utilizes multi-level contrastive learning and multi-task auxiliary classification to capture authorship and model-specific characteristics, and can identify the underlying LLM family. The framework outperforms existing baselines, especially in generalizing to unseen domains and new LLMs, and includes a multilingual, multi-domain dataset called FAIDSet.
Researchers at MBZUAI have developed LLM-DetectAIve, a tool to classify the degree of machine involvement in text generation. The system categorizes text into four types: human-written, machine-generated, machine-written and machine-humanized, and human-written and machine-polished. A demo website allows users to test the tool's ability to detect machine involvement. Why it matters: This research addresses the growing need to identify and classify AI-generated content in academic and professional settings, particularly in light of increasing LLM misuse.
This paper introduces a framework that combines machine learning for multi-class attack detection in IoT/IIoT networks with large language models (LLMs) for attack behavior analysis and mitigation suggestion. The framework uses role-play prompt engineering with RAG to guide LLMs like ChatGPT-o3 and DeepSeek-R1, and introduces new evaluation metrics for quantitative assessment. Experiments using Edge-IIoTset and CICIoT2023 datasets showed Random Forest as the best detection model and ChatGPT-o3 outperforming DeepSeek-R1 in attack analysis and mitigation.
MBZUAI student Zain Muhammad Mujahid is researching methods to detect media bias using NLP and LLMs. His approach profiles bias across media outlets using LLMs like ChatGPT to predict bias based on 16 identifiers. The research aims to develop a tool that instantly provides a bias profile for a given media URL. Why it matters: This research has the potential to combat misinformation and enhance media literacy in the region by providing tools to identify biased reporting, and it is expanding to Arabic and other languages.
This paper introduces a new task: detecting propaganda techniques in code-switched text. The authors created and released a corpus of 1,030 English-Roman Urdu code-switched texts annotated with 20 propaganda techniques. Experiments show the importance of directly modeling multilinguality and using the right fine-tuning strategy for this task.
MBZUAI researchers release LLM-DetectAIve, a tool for fine-grained detection of machine-generated text across four categories: human-written, machine-generated, machine-written then humanized, and human-written then machine-polished. The tool aims to address concerns about misuse of LLMs, especially in education and academia, by identifying attempts to obfuscate or polish content. LLM-DetectAIve is publicly accessible with code and a demonstration video provided.