GCC AI Research

Search

Results for "AI agents"

ILION: Deterministic Pre-Execution Safety Gates for Agentic AI Systems

arXiv ·

The paper introduces ILION, a deterministic execution gate designed to ensure the safety of autonomous AI agents by classifying proposed actions as either BLOCK or ALLOW. ILION uses a five-component cascade architecture that operates without statistical training, API dependencies, or labeled data. Evaluation against existing text-safety infrastructures demonstrates ILION's superior performance in preventing unauthorized actions, achieving an F1 score of 0.8515 with sub-millisecond latency.

FIRE: Fact-checking with Iterative Retrieval and Verification

arXiv ·

A novel agent-based framework called FIRE is introduced for fact-checking long-form text. FIRE iteratively integrates evidence retrieval and claim verification, deciding whether to provide a final answer or generate a subsequent search query. Experiments show FIRE achieves comparable performance to existing methods while reducing LLM costs by 7.6x and search costs by 16.5x.

Agent-X: Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks

arXiv ·

MBZUAI introduces Agent-X, a benchmark for evaluating multi-step reasoning in vision-centric agents across real-world, multimodal settings. Agent-X includes 828 tasks with diverse visual contexts and spans six environments, requiring tool use and stepwise decision-making. Experiments show that current LLMs struggle with multi-step vision tasks, achieving less than 50% success, highlighting areas for improvement in LMM reasoning and tool use.

DaringFed: A Dynamic Bayesian Persuasion Pricing for Online Federated Learning under Two-sided Incomplete Information

arXiv ·

This paper introduces DaringFed, a novel dynamic Bayesian persuasion pricing mechanism for online federated learning (OFL) that addresses the challenge of two-sided incomplete information (TII) regarding resources. It formulates the interaction between the server and clients as a dynamic signaling and pricing allocation problem within a Bayesian persuasion game, demonstrating the existence of a unique Bayesian persuasion Nash equilibrium. Evaluations on real and synthetic datasets demonstrate that DaringFed optimizes accuracy and convergence speed and improves the server's utility.

A Benchmark and Agentic Framework for Omni-Modal Reasoning and Tool Use in Long Videos

arXiv ·

A new benchmark, LongShOTBench, is introduced for evaluating multimodal reasoning and tool use in long videos, featuring open-ended questions and diagnostic rubrics. The benchmark addresses the limitations of existing datasets by combining temporal length and multimodal richness, using human-validated samples. LongShOTAgent, an agentic system, is also presented for analyzing long videos, with both the benchmark and agent demonstrating the challenges faced by state-of-the-art MLLMs.

MATRIX: Multimodal Agent Tuning for Robust Tool-Use Reasoning

arXiv ·

Researchers introduce MATRIX, a vision-centric agent tuning framework for robust tool-use reasoning in VLMs. The framework includes M-TRACE, a dataset of 28.5K multimodal tasks with 177K verified trajectories, and Pref-X, a set of 11K automatically generated preference pairs. Experiments show MATRIX consistently outperforms open- and closed-source VLMs across three benchmarks.

Transdisciplinary AI Education: The Confluence of Curricular and Community Needs in the Instruction of Artificial Intelligence

arXiv ·

This paper discusses the integration of AI into education, emphasizing a transdisciplinary approach that connects AI instruction to the broader curriculum and community needs. It delves into the AI program developed for Neom Community School in Saudi Arabia, where AI is taught as a subject and used to learn other subjects through the International Baccalaureate (IB) approach. The proposed method aims to make AI relevant throughout the curriculum by integrating it into Units of Inquiry.