Skip to content
GCC AI Research

Search

Results for "Prompting"

Creating Arabic LLM Prompts at Scale

arXiv ·

This paper introduces two methods for creating Arabic LLM prompts at scale: translating existing English prompt datasets and creating natural language prompts from Arabic NLP datasets. Using these methods, the authors generated over 67.4 million Arabic prompts covering tasks like summarization and question answering. Fine-tuning a 7B Qwen2 model on these prompts outperforms a 70B Llama3 model in handling Arabic prompts. Why it matters: The research provides a cost-effective approach to scaling Arabic LLM training data, potentially improving the performance of smaller, more accessible models for Arabic NLP.

Solving complex problems with LLMs: A new prompting strategy presented at NeurIPS

MBZUAI ·

Researchers from MBZUAI and King's College London have developed a new prompting strategy called self-guided exploration to improve LLM performance on combinatorial problems. The method was tested on complex challenges like the traveling salesman problem. The findings will be presented at the 38th Annual Conference on Neural Information Processing Systems (NeurIPS) in Vancouver. Why it matters: This research could lead to practical applications of LLMs in industries like logistics, planning, and scheduling by offering new approaches to computationally complex problems.

Reasoning with interactive guidance

MBZUAI ·

Niket Tandon from the Allen Institute for AI presented a talk at MBZUAI on enabling large language models to focus on human needs and continuously learn from interactions. He proposed a memory architecture inspired by the theory of recursive reminding to guide models in avoiding past errors. The talk addressed who to ask, what to ask, when to ask and how to apply the obtained guidance. Why it matters: The research explores how to align LLMs with human feedback, a key challenge for practical and ethical AI deployment.

MedPromptX: Grounded Multimodal Prompting for Chest X-ray Diagnosis

arXiv ·

The paper introduces MedPromptX, a clinical decision support system using multimodal large language models (MLLMs), few-shot prompting (FP), and visual grounding (VG) for chest X-ray diagnosis, integrating imagery with EHR data. MedPromptX refines few-shot data dynamically for real-time adjustment to new patient scenarios and narrows the search area in X-ray images. The study introduces MedPromptX-VQA, a new visual question answering dataset, and demonstrates state-of-the-art performance with an 11% improvement in F1-score compared to baselines.

Confidence Matters: Revisiting Intrinsic Self-Correction Capabilities of Large Language Models

arXiv ·

This paper investigates the intrinsic self-correction capabilities of LLMs, identifying model confidence as a key latent factor. Researchers developed an "If-or-Else" (IoE) prompting framework to guide LLMs in assessing their own confidence and improving self-correction accuracy. Experiments demonstrate that the IoE-based prompt enhances the accuracy of self-corrected responses, with code available on GitHub.

Profiling News Media for Factuality and Bias Using LLMs and the Fact-Checking Methodology of Human Experts

arXiv ·

A new methodology emulating fact-checker criteria assesses news outlet factuality and bias using LLMs. The approach uses prompts based on fact-checking criteria to elicit and aggregate LLM responses for predictions. Experiments demonstrate improvements over baselines, with error analysis on media popularity and region, and a released dataset/code at https://github.com/mbzuai-nlp/llm-media-profiling.

Fact-Checking Complex Claims with Program-Guided Reasoning

arXiv ·

This paper introduces ProgramFC, a fact-checking model that decomposes complex claims into simpler sub-tasks using a library of functions. The model uses LLMs to generate reasoning programs and executes them by delegating sub-tasks, enhancing explainability and data efficiency. Experiments on fact-checking datasets demonstrate ProgramFC's superior performance compared to baseline methods, with publicly available code and data.