Search

Results for "SimulMask"

Simultaneous Masking, Not Prompting Optimization: A Paradigm Shift in Fine-tuning LLMs for Simultaneous Translation

arXiv · May 16

This paper introduces SimulMask, a new paradigm for fine-tuning large language models (LLMs) for simultaneous translation. SimulMask utilizes a novel attention masking approach that models simultaneous translation during fine-tuning by masking attention for a desired decision policy. Applied to a Falcon LLM on the IWSLT 2017 dataset, SimulMask achieves improved translation quality compared to state-of-the-art prompting optimization strategies across five language pairs while reducing computational cost. Why it matters: The proposed method offers a more efficient way to adapt LLMs for real-time translation, potentially enhancing multilingual communication tools and services.

KVL releases new open source to visualize supercomputer simulations

KAUST · Oct 8

KAUST's Visualization Core Lab (KVL) has released inshimtu, a pseudo in situ visualization system for scientists working with large datasets and supercomputer simulations. Inshimtu simplifies the implementation of in situ visualization by using existing simulation output files without requiring changes to the simulation code. It helps scientists determine if implementing a full in situ visualization into their code is worthwhile. Why it matters: This open-source tool can improve the efficiency of supercomputing research in the region by allowing researchers to assess the value of in situ visualization before fully committing to it.

An AI trained to spot hidden objects can see through camouflage - New Scientist

Inception · May 12

Researchers at the University of Maryland have developed an AI system that can identify objects hidden by camouflage. The AI uses a convolutional neural network trained on synthetic data to detect partially occluded objects. The system outperformed existing object detection methods in tests on real-world images. Why it matters: The work demonstrates potential applications of AI in defense, security, and search and rescue operations in the Middle East and elsewhere.

A Culturally-diverse Multilingual Multimodal Video Benchmark & Model

arXiv · Jun 8

A new benchmark, ViMUL-Bench, is introduced to evaluate video LLMs across 14 languages, including Arabic, with a focus on cultural inclusivity. The benchmark includes 8k manually verified samples across 15 categories and varying video durations. A multilingual video LLM, ViMUL, is also presented, along with a training set of 1.2 million samples, with both to be publicly released.

KAUST President’s address on keeping the community safe

KAUST · Apr 12

KAUST is distributing five face masks to each member of the KAUST community who wants them. The university is also working with a social enterprise to produce fabric face masks and has started an effort to produce DIY reusable masks. KAUST encourages mask use when leaving the house, but emphasizes that masks should not distract from social distancing and hand washing. Why it matters: This initiative demonstrates KAUST's commitment to community health and safety during the COVID-19 pandemic, reflecting a proactive approach to public health within the institution.

Real-time Few-shot Realistic Avatars

MBZUAI · Invalid Date

Ekaterina Radionova from Smarter AI (formerly Samsung AI Center) presented an approach to generating lifelike real-time avatars. The work focuses on generating high-quality video with authentic facial features to support online generation. Radionova's master's degree is from Skoltech on Data Science program and Bachelor degree at Moscow Institute of Physics and Technology on Applied Math. Why it matters: Achieving realistic real-time avatars is critical for applications in online communication, entertainment, and virtual reality within the region.

Multimodality for story-level understanding and generation of visual data

MBZUAI · Invalid Date

Vicky Kalogeiton from École Polytechnique discussed the importance of multimodality for story-level recognition and generation using video, audio, text, masks and clinical data. She presented on multimodal video understanding using FunnyNet-W and Short Film Dataset. She further showed examples of visual generation from text and other modalities (ET, CAD, DynamicGuidance). Why it matters: Multimodal AI research is growing globally, and this talk highlights the potential of combining different data types for enhanced understanding and generation, which could have implications for various applications, including those relevant to the Middle East.