GCC AI Research

Archive Monthly

June 2024

5 articles

Top Stories

Diffusion-BBO: Diffusion-Based Inverse Modeling for Online Black-Box Optimization

arXiv · · Research RL

This paper introduces Diffusion-BBO, a new online black-box optimization (BBO) framework that uses a conditional diffusion model as an inverse surrogate model. The framework employs an Uncertainty-aware Exploration (UaE) acquisition function to propose scores in the objective space for conditional sampling. The approach is shown theoretically to achieve a near-optimal solution and empirically outperforms existing online BBO baselines across 6 scientific discovery tasks.

Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs

arXiv · · Research LLM

MBZUAI researchers introduce Web2Code, a new large-scale dataset and evaluation framework for training and benchmarking multimodal LLMs on webpage understanding and HTML code generation. The dataset includes webpage images, HTML code, and QA pairs about webpage content. Experiments demonstrate the dataset's utility in webpage understanding, code generation, and general visual domain tasks, with code and data available on Github.

VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding

arXiv · · Research LLM

MBZUAI researchers introduce VideoGPT+, a novel video Large Multimodal Model (LMM) that integrates image and video encoders to leverage both spatial and temporal information in videos. They also introduce VCGBench-Diverse, a comprehensive benchmark for evaluating video LMMs across 18 video categories. VideoGPT+ demonstrates improved performance on multiple video benchmarks, including VCGBench and MVBench.

Deep Vision-Based Framework for Coastal Flood Prediction Under Climate Change Impacts and Shoreline Adaptations

arXiv · · Research CV

This paper introduces a deep vision-based framework for predicting coastal floods under climate change, addressing the challenges of limited training data and high-dimensional output. The framework employs and compares various deep learning models, including a custom compact CNN architecture, against geostatistical and traditional machine learning methods. A new synthetic dataset of flood inundation maps for Abu Dhabi's coast is also provided to benchmark future models.