Researchers at MBZUAI have introduced EvoLMM, a self-evolving framework for large multimodal models that enhances reasoning capabilities without human-annotated data or reward distillation. EvoLMM uses two cooperative agents, a Proposer and a Solver, which generate image-grounded questions and solve them through internal consistency, using a continuous self-rewarding process. Evaluations using Qwen2.5-VL as the base model showed performance gains of up to 3% on multimodal math-reasoning benchmarks like ChartQA, MathVista, and MathVision using only raw training images.
Sami Haddadin from the Technical University of Munich (TUM) discusses a shift in robotics towards machines that autonomously develop their own blueprints and controls. He highlights advancements driven by human-centered design, soft control, and model-based machine learning, enabling human-robot collaboration in manufacturing and healthcare. Haddadin also presents progress towards autonomous machine design and modular control architectures for complex manipulation tasks. Why it matters: This research has implications for advancing robotics and AI in the GCC region, especially in manufacturing and healthcare, by enabling safer and more efficient human-robot collaboration.
Eliseo Ferrante from NYU Abu Dhabi presented work on increasing the controllability of swarm robotics systems. The research covers microscopic control via implicit intelligent leaders and macroscopic control via automated generation of swarm behaviors. Grammatical evolution and generative AI methods are used to produce collective behaviors aligned with human specifications. Why it matters: This research enhances the applicability of swarm robotics in real-world scenarios by improving control and coordination, potentially impacting industries like logistics, environmental monitoring, and disaster response in the region.
Dr. Mikhail Burtsev of the London Institute presented research on GENA-LM, a suite of transformer-based DNA language models. The talk addressed the challenge of scaling transformers for genomic sequences, proposing recurrent memory augmentation to handle long input sequences efficiently. This approach improves language modeling performance and holds promise for memory-intensive applications in bioinformatics. Why it matters: This research can significantly advance AI's capabilities in genomics by enabling the processing of much larger DNA sequences, with potential breakthroughs in understanding and treating diseases.
This paper introduces Diffusion-BBO, a new online black-box optimization (BBO) framework that uses a conditional diffusion model as an inverse surrogate model. The framework employs an Uncertainty-aware Exploration (UaE) acquisition function to propose scores in the objective space for conditional sampling. The approach is shown theoretically to achieve a near-optimal solution and empirically outperforms existing online BBO baselines across 6 scientific discovery tasks.