The paper introduces MIRAGE, a framework for evaluating LLMs' ability to simulate human behaviors in murder mystery games. MIRAGE uses four methods: TII, CIC, ICI and SCI to assess the LLMs' role-playing proficiency. Experiments show that even GPT-4 struggles with the complexities of the MIRAGE framework.
Fudan University's Zhongyu Wei presented research on social simulation driven by LLMs, covering individual and large-scale social movement simulation. Wei directs the Data Intelligence and Social Computing Lab (Fudan DISC) and has published extensively on multimodal large models and social computing. His work includes the Volcano multimodal model, DISC-MedLLM, and ElectionSim. Why it matters: Using LLMs for social simulation could provide new tools for understanding and potentially predicting social dynamics in the Arab world.
MBZUAI Professor Yoshihiko Nakamura discusses his career in robotics, starting from its early days as a field. He notes the initial skepticism towards robotics as an academic discipline in the 1970s and its gradual formalization. Nakamura's research is driven by the mathematics of movement, optimization, and non-linearity, drawing inspiration from neuroscience, psychology, and linguistics. Why it matters: Nakamura's insights provide a historical perspective on the evolution of robotics research and highlight the interdisciplinary nature of the field, with implications for the future of AI development in the region.
Maha Elgarf from NYU Abu Dhabi presented research on using social robots to stimulate creativity in children through subconscious mimicry, leveraging the 'chameleon effect'. The research involved a series of studies where children engaged in storytelling with a social robot, and their creativity was assessed. Elgarf also discussed using Large Language Models (LLMs) in education and challenges in the field. Why it matters: This explores innovative applications of social robotics and AI in education within the UAE, potentially enhancing children's learning and creativity.
MBZUAI researchers demonstrated a low-latency, multilingual multimodal AI system at GITEX that integrates speech, text, and visual capabilities for more lifelike human-machine conversation. The demo, led by Dr. Hisham Cholakkal, includes a mobile app where users can point their camera at an object and ask questions, receiving spoken answers in multiple languages. They are also integrating the model into a robot dog that can respond to voice commands. Why it matters: This work addresses key challenges in deploying LLMs to real-world applications in the Middle East, such as multilingual support and real-time responsiveness.
MBZUAI researchers developed MedAgentSim, a simulated hospital environment to evaluate AI diagnostic abilities. The simulation uses LLM-powered agents to mimic doctor-patient conversations, providing a dynamic assessment of diagnostic skills. The system includes doctor, patient, and evaluator agents that interact within the simulated hospital, making real-time decisions. Why it matters: This research offers a more realistic evaluation of AI in clinical settings, addressing limitations of current benchmarks and potentially improving AI's use in healthcare.