Skip to content
GCC AI Research

Search

Results for "Web2Code"

Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs

arXiv ·

MBZUAI researchers introduce Web2Code, a new large-scale dataset and evaluation framework for training and benchmarking multimodal LLMs on webpage understanding and HTML code generation. The dataset includes webpage images, HTML code, and QA pairs about webpage content. Experiments demonstrate the dataset's utility in webpage understanding, code generation, and general visual domain tasks, with code and data available on Github.

Web2Code: A new dataset to enhance multimodal LLM performance presented at NeurIPS

MBZUAI ·

MBZUAI researchers introduced Web2Code, a new dataset suite, at NeurIPS to enhance multimodal LLM performance in web page analysis and HTML generation. The suite includes a fine-tuning dataset and two benchmark datasets. Instruction tuning with Web2Code improved performance on specialized tasks without affecting general capabilities. Why it matters: This contribution addresses a key limitation in current multimodal LLMs, potentially boosting productivity in web design and development by providing targeted training data.

Can we tell when AI wrote that code? This project thinks so, even when the AI tries to hide it

MBZUAI ·

MBZUAI researchers introduced Droid, a resource suite and detector family, at EMNLP 2025 designed to distinguish between AI-generated and human-written code. The project addresses the challenge of identifying AI-generated code in software development, considering the prevalence of AI-suggested code and the risks of obfuscated backdoors and feedback loops. DroidCollection includes over one million code samples across seven programming languages, three coding domains, and outputs from 43 different code models, including human-AI co-authored code and adversarially humanized machine code. Why it matters: This research is crucial for maintaining software security and integrity in the age of AI-assisted coding, providing a robust tool for detecting AI-generated code across diverse languages and domains.

How secure is AI-generated Code: A Large-Scale Comparison of Large Language Models

arXiv ·

A study compared the vulnerability of C programs generated by nine state-of-the-art Large Language Models (LLMs) using a zero-shot prompt. The researchers introduced FormAI-v2, a dataset of 331,000 C programs generated by these LLMs, and found that at least 62.07% of the generated programs contained vulnerabilities, detected via formal verification. The research highlights the need for risk assessment and validation when deploying LLM-generated code in production environments.

Fact-Checking Complex Claims with Program-Guided Reasoning

arXiv ·

This paper introduces ProgramFC, a fact-checking model that decomposes complex claims into simpler sub-tasks using a library of functions. The model uses LLMs to generate reasoning programs and executes them by delegating sub-tasks, enhancing explainability and data efficiency. Experiments on fact-checking datasets demonstrate ProgramFC's superior performance compared to baseline methods, with publicly available code and data.

Web-Based Expert System for Civil Service Regulations: RCSES

arXiv ·

The paper introduces a web-based expert system called RCSES for civil service regulations in Saudi Arabia. The system covers 17 regulations and utilizes XML for knowledge representation and ASP.net for rule-based inference. RCSES was validated by domain experts and technical users, and compared favorably to other web-based expert systems.