Skip to content
GCC AI Research

Search

Results for "content classification"

Algorithms and Software for Text Classification

MBZUAI ·

The article discusses the challenges in effectively applying text classification techniques, despite the availability of tools like LibMultiLabel. It highlights the importance of guiding users to appropriately use machine learning methods due to considerations in practical applications such as evaluation criteria and data strategies. The piece also mentions a panel discussion hosted by MBZUAI in collaboration with the Manara Center for Coexistence and Dialogue. Why it matters: This signals ongoing efforts within the UAE AI ecosystem to address practical challenges and promote responsible AI usage in NLP applications.

Multimodal pretraining for objectionable content detection in videos

MBZUAI ·

Thamar Solorio from the University of Houston presented preliminary work on multimodal representation learning for detecting objectionable content in videos at MBZUAI. The research investigates two multimodal pretraining mechanisms, finding contrastive learning more effective than unimodal representation prediction. The study also assesses the value of common multimodal corpora for this task. Why it matters: This research contributes to the development of AI techniques for content moderation, an important issue for online platforms in the Middle East and globally.

Truth-O-Meter: Making neural content meaningful and truthful

MBZUAI ·

A new content improvement system has been developed to address issues of randomness and incorrectness in text generated by deep learning models like GPT-3. The system uses text mining to identify correct sentences and employs syntactic/semantic generalization to substitute problematic elements. The system can substantially improve the factual correctness and meaningfulness of raw content. Why it matters: Improving the quality of automatically generated content is crucial for ensuring reliability and trustworthiness across various AI applications.

Hybrid Deep Feature Extraction and ML for Construction and Demolition Debris Classification

arXiv ·

This paper introduces a hybrid deep learning and machine learning pipeline for classifying construction and demolition waste. A dataset of 1,800 images from UAE construction sites was created, and deep features were extracted using a pre-trained Xception network. The combination of Xception features with machine learning classifiers achieved up to 99.5% accuracy, demonstrating state-of-the-art performance for debris identification.

A mystery fit for a DetectAIve: Classifying machine involvement in writing

MBZUAI ·

Researchers at MBZUAI have developed LLM-DetectAIve, a tool to classify the degree of machine involvement in text generation. The system categorizes text into four types: human-written, machine-generated, machine-written and machine-humanized, and human-written and machine-polished. A demo website allows users to test the tool's ability to detect machine involvement. Why it matters: This research addresses the growing need to identify and classify AI-generated content in academic and professional settings, particularly in light of increasing LLM misuse.

FanarGuard: A Culturally-Aware Moderation Filter for Arabic Language Models

arXiv ·

The paper introduces FanarGuard, a bilingual moderation filter for Arabic and English language models that considers both safety and cultural alignment. A dataset of 468K prompt-response pairs was created and scored by LLM judges on harmlessness and cultural awareness to train the filter. The first benchmark targeting Arabic cultural contexts was developed to evaluate cultural alignment. Why it matters: FanarGuard advances context-sensitive AI safeguards by integrating cultural awareness into content moderation, addressing a critical gap in current alignment techniques.

Hunting for Spammers: Detecting Evolved Spammers on Twitter

arXiv ·

A study analyzes spam content on trending hashtags on Saudi Twitter, finding that approximately 75% of the total generated content is spam. The paper assesses the performance of previous spam detection systems on a newly gathered dataset and proposes an updated manual classification algorithm to improve accuracy. Adapted features are used to build a new data-driven detection system to respond to spammers' evolving techniques. Why it matters: The high prevalence of spam in Arabic content on Twitter necessitates the development of adaptive detection techniques to maintain the quality and trustworthiness of online information in the region.