Researchers developed COVIBOT, a smart chatbot to spread awareness and assist during the COVID-19 pandemic in Saudi Arabia. The chatbot uses Azure Cognitive Services and is available in both English and Arabic. COVIBOT's use cases were tested and validated using a scenario-based approach.
Researchers address the challenge of limited Arabic medical dialogue data by generating 80,000 synthetic question-answer pairs using ChatGPT-4o and Gemini 2.5 Pro, expanding an initial dataset of 20,000 records. They fine-tuned five LLMs, including Mistral-7B and AraGPT2, and evaluated performance using BERTScore and expert review. Results showed that training with ChatGPT-4o-generated data led to higher F1-scores and fewer hallucinations across models. Why it matters: This demonstrates the potential of synthetic data augmentation to improve domain-specific Arabic language models, particularly for low-resource medical NLP applications.
A new benchmark, LongShOTBench, is introduced for evaluating multimodal reasoning and tool use in long videos, featuring open-ended questions and diagnostic rubrics. The benchmark addresses the limitations of existing datasets by combining temporal length and multimodal richness, using human-validated samples. LongShOTAgent, an agentic system, is also presented for analyzing long videos, with both the benchmark and agent demonstrating the challenges faced by state-of-the-art MLLMs.