Skip to content
GCC AI Research

Search

Results for "knowledge extraction"

AI-Assisted Knowledge Navigation

MBZUAI ·

Akhil Arora from EPFL presented a framework for AI-assisted knowledge navigation, focusing on understanding and enhancing human navigation on Wikipedia. The framework includes methods for modeling navigation patterns, identifying knowledge gaps, and assessing their causal impact. He also discussed applications beyond Wikipedia, such as multimodal knowledge navigation assistants and multilingual knowledge gap mitigation. Why it matters: This research has the potential to improve information systems by making online knowledge more accessible and navigable, especially for platforms like Wikipedia that serve as critical resources for global knowledge sharing.

Multimodal Factual Knowledge Acquisition

MBZUAI ·

Manling Li from UIUC proposes a new research direction: Event-Centric Multimodal Knowledge Acquisition, which transforms traditional entity-centric single-modal knowledge into event-centric multi-modal knowledge. The approach addresses challenges in understanding multimodal semantic structures using zero-shot cross-modal transfer (CLIP-Event) and long-horizon temporal dynamics through the Event Graph Model. Li's work aims to enable machines to capture complex timelines and relationships, with applications in timeline generation, meeting summarization, and question answering. Why it matters: This research pioneers a new approach to multimodal information extraction, moving from static entity-based understanding to dynamic, event-centric knowledge acquisition, which is essential for advanced AI applications in understanding complex scenarios.

Cross-Document Topic-Aligned Chunking for Retrieval-Augmented Generation

arXiv ·

This paper introduces Cross-Document Topic-Aligned (CDTA) chunking to address knowledge fragmentation in Retrieval-Augmented Generation (RAG) systems. CDTA identifies topics across documents, maps segments to topics, and synthesizes them into unified chunks. Experiments on HotpotQA and UAE legal texts show that CDTA improves faithfulness and citation accuracy compared to existing chunking methods, especially for complex queries requiring multi-hop reasoning.

Explainable Fact Checking for Statistical and Property Claims

MBZUAI ·

EURECOM researchers developed data-driven verification methods using structured datasets to assess statistical and property claims. The approach translates text claims into SQL queries on relational databases for statistical claims. For property claims, they use knowledge graphs to verify claims and generate explanations. Why it matters: The methods aim to support fact-checkers by efficiently labeling claims with interpretable explanations, potentially combating misinformation in the region and beyond.

Physics of Language Models: Knowledge Storage, Extraction, and Manipulation

MBZUAI ·

A CMU professor and MBZUAI affiliated faculty presented research on how LLMs store and use knowledge learned during pre-training. The study used a synthetic biography dataset to show that LLMs may not effectively use memorized knowledge at inference time, even with zero training loss. Data augmentation during pre-training can force the model to store knowledge in specific token embeddings. Why it matters: The research highlights limitations in LLM knowledge manipulation and extraction, with implications for improving model architectures and training strategies for more effective knowledge utilization in Arabic LLMs.

MOLE: Metadata Extraction and Validation in Scientific Papers Using LLMs

arXiv ·

KAUST researchers introduced MOLE, a framework leveraging LLMs for automated metadata extraction from scientific papers. The system processes documents in multiple formats and validates outputs, targeting datasets beyond Arabic. A new benchmark dataset has been released to evaluate progress in metadata extraction.