Search

Results for "information extraction"

Guided Deep List: Automating the Generation of Epidemiological Line Lists from Open Sources

arXiv · Feb 22

The paper introduces Guided Deep List, a tool for automating the generation of epidemiological line lists from open source reports. The tool uses distributed vector representations and dependency parsing to extract tabular data on disease outbreaks. It was evaluated on MERS outbreak data in Saudi Arabia, demonstrating improved accuracy over baseline methods and enabling epidemiological inferences.

AI-Assisted Knowledge Navigation

MBZUAI · Invalid Date

Akhil Arora from EPFL presented a framework for AI-assisted knowledge navigation, focusing on understanding and enhancing human navigation on Wikipedia. The framework includes methods for modeling navigation patterns, identifying knowledge gaps, and assessing their causal impact. He also discussed applications beyond Wikipedia, such as multimodal knowledge navigation assistants and multilingual knowledge gap mitigation. Why it matters: This research has the potential to improve information systems by making online knowledge more accessible and navigable, especially for platforms like Wikipedia that serve as critical resources for global knowledge sharing.

Multimodal Factual Knowledge Acquisition

MBZUAI · Invalid Date

Manling Li from UIUC proposes a new research direction: Event-Centric Multimodal Knowledge Acquisition, which transforms traditional entity-centric single-modal knowledge into event-centric multi-modal knowledge. The approach addresses challenges in understanding multimodal semantic structures using zero-shot cross-modal transfer (CLIP-Event) and long-horizon temporal dynamics through the Event Graph Model. Li's work aims to enable machines to capture complex timelines and relationships, with applications in timeline generation, meeting summarization, and question answering. Why it matters: This research pioneers a new approach to multimodal information extraction, moving from static entity-based understanding to dynamic, event-centric knowledge acquisition, which is essential for advanced AI applications in understanding complex scenarios.

Explainable Fact Checking for Statistical and Property Claims

MBZUAI · Invalid Date

EURECOM researchers developed data-driven verification methods using structured datasets to assess statistical and property claims. The approach translates text claims into SQL queries on relational databases for statistical claims. For property claims, they use knowledge graphs to verify claims and generate explanations. Why it matters: The methods aim to support fact-checkers by efficiently labeling claims with interpretable explanations, potentially combating misinformation in the region and beyond.

MOLE: Metadata Extraction and Validation in Scientific Papers Using LLMs

arXiv · May 26

KAUST researchers introduced MOLE, a framework leveraging LLMs for automated metadata extraction from scientific papers. The system processes documents in multiple formats and validates outputs, targeting datasets beyond Arabic. A new benchmark dataset has been released to evaluate progress in metadata extraction.

Modeling Text as a Living Object

MBZUAI · Invalid Date

The InterText project, funded by the European Research Council, aims to advance NLP by developing a framework for modeling fine-grained relationships between texts. This approach enables tracing the origin and evolution of texts and ideas. Iryna Gurevych from the Technical University of Darmstadt presented the intertextual approach to NLP, covering data modeling, representation learning, and practical applications. Why it matters: This research could enable a new generation of AI applications for text work and critical reading, with potential applications in collaborative knowledge construction and document revision assistance.

An Empirical Study of Pre-trained Transformers for Arabic Information Extraction

arXiv · Apr 30

This paper introduces GigaBERT, a customized bilingual BERT model pre-trained for Arabic NLP and English-to-Arabic zero-shot transfer learning. The study evaluates GigaBERT's performance on four information extraction tasks: named entity recognition, part-of-speech tagging, argument role labeling, and relation extraction. Results show that GigaBERT outperforms mBERT, XLM-RoBERTa, and AraBERT in both supervised and zero-shot transfer settings. Why it matters: GigaBERT advances Arabic NLP by providing a high-performing, publicly available model tailored for the complexities of the Arabic language and cross-lingual applications.