This paper introduces Adaptive Entropy-aware Optimization (AEO), a new framework to tackle Multimodal Open-set Test-time Adaptation (MM-OSTTA). AEO uses Unknown-aware Adaptive Entropy Optimization (UAE) and Adaptive Modality Prediction Discrepancy Optimization (AMP) to distinguish unknown class samples during online adaptation by amplifying the entropy difference between known and unknown samples. The study establishes a new benchmark derived from existing datasets with five modalities and evaluates AEO's performance across various domain shift scenarios, demonstrating its effectiveness in long-term and continual MM-OSTTA settings.
This paper introduces a novel black-box adversarial attack method, Mixup-Attack, to generate universal adversarial examples for remote sensing data. The method identifies common vulnerabilities in neural networks by attacking features in the shallow layer of a surrogate model. The authors also present UAE-RS, the first dataset of black-box adversarial samples in remote sensing, to benchmark the robustness of deep learning models against adversarial attacks.
This paper introduces MOTOR, a multimodal retrieval and re-ranking approach for medical visual question answering (MedVQA) that uses grounded captions and optimal transport to capture relationships between queries and retrieved context, leveraging both textual and visual information. MOTOR identifies clinically relevant contexts to augment VLM input, achieving higher accuracy on MedVQA datasets. Empirical analysis shows MOTOR outperforms state-of-the-art methods by an average of 6.45%.
This paper introduces Provable Unrestricted Adversarial Training (PUAT), a novel adversarial training approach. PUAT enhances robustness against both unrestricted and restricted adversarial examples while improving standard generalizability by aligning the distributions of adversarial examples, natural data, and the classifier's learned distribution. The approach uses partially labeled data and an augmented triple-GAN to generate effective unrestricted adversarial examples, demonstrating superior performance on benchmarks.
The paper introduces VENOM, a text-driven framework for generating high-quality unrestricted adversarial examples using diffusion models. VENOM unifies image content generation and adversarial synthesis into a single reverse diffusion process, enhancing both attack success rate and image quality. The framework incorporates an adaptive adversarial guidance strategy with momentum to ensure the generated adversarial examples align with the distribution of natural images.