Search

Results for "perturbation"

When Benchmarks are Targets: Revealing the Sensitivity of Large Language Model Leaderboards

arXiv · Feb 1

Researchers from the National Center for AI in Saudi Arabia investigated the sensitivity of Large Language Model (LLM) leaderboards to minor benchmark perturbations. They found that small changes, like choice order, can shift rankings by up to 8 positions. The study recommends hybrid scoring and warns against over-reliance on simple benchmark evaluations, providing code for further research.

Award-winning algorithm aids observation

KAUST · Aug 27

KAUST researchers developed a machine learning algorithm to control a deformable mirror within the Subaru Telescope's exoplanet imaging camera, compensating for atmospheric turbulence. The algorithm, which computes a partial singular value decomposition (SVD), outperforms a standard SVD by a factor of four. The KAUST team received a best paper award at the PASC Conference for this work, which has already been deployed at the Subaru Telescope. Why it matters: This advancement enables sharper images of exoplanets, facilitating their identification and study, and showcases the impact of optimizing core linear algebra algorithms.

Confidence sets for Causal Discovery

MBZUAI · Invalid Date

A new framework for constructing confidence sets for causal orderings within structural equation models (SEMs) is presented. It leverages a residual bootstrap procedure to test the goodness-of-fit of causal orderings, quantifying uncertainty in causal discovery. The method is computationally efficient and suitable for medium-sized problems while maintaining theoretical guarantees as the number of variables increases. Why it matters: This offers a new dimension of uncertainty quantification that enhances the robustness and reliability of causal inference in complex systems, but there is no indication of connection to the Middle East.

The Cylindrical Representation Hypothesis for Language Model Steering

arXiv · May 3

Researchers from MBZUAI have proposed the Cylindrical Representation Hypothesis (CRH) to explain the instability and unpredictability observed in large language model steering. CRH relaxes the orthogonality assumption of the existing Linear Representation Hypothesis, positing a cylindrical structure where a central axis captures concept differences and a surrounding normal plane controls steering sensitivity. The hypothesis suggests that the intrinsic uncertainty in identifying specific sensitive sectors within this normal plane accounts for why steering outcomes frequently fluctuate even with well-aligned directions. Why it matters: This research offers a more robust theoretical framework for understanding and potentially improving the control and reliability of large language models.

On Transferability of Machine Learning Models

MBZUAI · Invalid Date

This article discusses domain shift in machine learning, where testing data differs from training data, and methods to mitigate it via domain adaptation and generalization. Domain adaptation uses labeled source data and unlabeled target data. Domain generalization uses labeled data from single or multiple source domains to generalize to unseen target domains. Why it matters: Research in mitigating domain shift enhances the robustness and applicability of AI models in diverse real-world scenarios.

The role of data-driven models in quantifying uncertainty

KAUST · Jul 15

KAUST Professor Raul Tempone, an expert in Uncertainty Quantification (UQ), has been appointed as an Alexander von Humboldt Professor at RWTH Aachen University in Germany. This professorship will enable him to further his research on mathematics for uncertainty quantification with new collaborators. Tempone believes the KAUST Strategic Initiative for Uncertainty Quantification (SRI-UQ) contributed to this award. Why it matters: This appointment enhances KAUST's visibility and facilitates cross-fertilization between European and KAUST research groups, benefiting both institutions and attracting talent.