Skip to content
GCC AI Research

Search

Results for "natural language generation"

Towards Trustworthy AI-Generated Text

MBZUAI ·

Xiuying Chen from KAUST presented her work on improving the trustworthiness of AI-generated text, focusing on accuracy and robustness. Her research analyzes causes of hallucination in language models related to semantic understanding and neglect of input knowledge, and proposes solutions. She also demonstrated vulnerabilities of language models to noise and enhances robustness using augmentation techniques. Why it matters: Improving the reliability of AI-generated text is crucial for its deployment in sensitive domains like healthcare and scientific discovery, where accuracy is paramount.

Truth-O-Meter: Making neural content meaningful and truthful

MBZUAI ·

A new content improvement system has been developed to address issues of randomness and incorrectness in text generated by deep learning models like GPT-3. The system uses text mining to identify correct sentences and employs syntactic/semantic generalization to substitute problematic elements. The system can substantially improve the factual correctness and meaningfulness of raw content. Why it matters: Improving the quality of automatically generated content is crucial for ensuring reliability and trustworthiness across various AI applications.

Creating Arabic LLM Prompts at Scale

arXiv ·

This paper introduces two methods for creating Arabic LLM prompts at scale: translating existing English prompt datasets and creating natural language prompts from Arabic NLP datasets. Using these methods, the authors generated over 67.4 million Arabic prompts covering tasks like summarization and question answering. Fine-tuning a 7B Qwen2 model on these prompts outperforms a 70B Llama3 model in handling Arabic prompts. Why it matters: The research provides a cost-effective approach to scaling Arabic LLM training data, potentially improving the performance of smaller, more accessible models for Arabic NLP.

GenAI Content Detection Task 1: English and Multilingual Machine-Generated Text Detection: AI vs. Human

arXiv ·

The GenAI Content Detection Task 1 is a shared task on detecting machine-generated text, featuring monolingual (English) and multilingual subtasks. The task, part of the GenAI workshop at COLING 2025, attracted 36 teams for the English subtask and 26 for the multilingual one. The organizers provide a detailed overview of the data, results, system rankings, and analysis of the submitted systems.