Researchers introduce UnsafeChain, a new safety alignment dataset designed to improve the safety of large reasoning models (LRMs) by focusing on 'hard prompts' that elicit harmful outputs. The dataset identifies and corrects unsafe completions into safe responses, exposing models to unsafe behaviors and guiding their correction. Fine-tuning LRMs on UnsafeChain demonstrates enhanced safety and preservation of general reasoning ability compared to existing datasets like SafeChain and STAR-1.
This paper introduces two methods for creating Arabic LLM prompts at scale: translating existing English prompt datasets and creating natural language prompts from Arabic NLP datasets. Using these methods, the authors generated over 67.4 million Arabic prompts covering tasks like summarization and question answering. Fine-tuning a 7B Qwen2 model on these prompts outperforms a 70B Llama3 model in handling Arabic prompts. Why it matters: The research provides a cost-effective approach to scaling Arabic LLM training data, potentially improving the performance of smaller, more accessible models for Arabic NLP.