Skip to content
GCC AI Research

Search

Results for "fine-grained classification"

Fine-tuning Text-to-Image Models: Reinforcement Learning and Reward Over-Optimization

MBZUAI ·

The article discusses research on fine-tuning text-to-image diffusion models, including reward function training, online reinforcement learning (RL) fine-tuning, and addressing reward over-optimization. A Text-Image Alignment Assessment (TIA2) benchmark is introduced to study reward over-optimization. TextNorm, a method for confidence calibration in reward models, is presented to reduce over-optimization risks. Why it matters: Improving the alignment and fidelity of text-to-image models is crucial for generating high-quality content, and addressing over-optimization enhances the reliability of these models in creative applications.

The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding

arXiv ·

The paper introduces the Prism Hypothesis, which posits a correspondence between an encoder's feature spectrum and its functional role, with semantic encoders capturing low-frequency components and pixel encoders retaining high-frequency information. Based on this, the authors propose Unified Autoencoding (UAE), a model that harmonizes semantic structure and pixel details using a frequency-band modulator. Experiments on ImageNet and MS-COCO demonstrate that UAE effectively unifies semantic abstraction and pixel-level fidelity, achieving state-of-the-art performance.

Hybrid Deep Feature Extraction and ML for Construction and Demolition Debris Classification

arXiv ·

This paper introduces a hybrid deep learning and machine learning pipeline for classifying construction and demolition waste. A dataset of 1,800 images from UAE construction sites was created, and deep features were extracted using a pre-trained Xception network. The combination of Xception features with machine learning classifiers achieved up to 99.5% accuracy, demonstrating state-of-the-art performance for debris identification.