This research evaluates LLMs like ChatGPT, Llama, Aya, Jais, and ACEGPT on Arabic automated essay scoring (AES) using the AR-AES dataset. The study uses zero-shot, few-shot learning, and fine-tuning approaches while using a mixed-language prompting strategy. ACEGPT performed best among the LLMs with a QWK of 0.67, while a smaller BERT model achieved 0.88. Why it matters: The study highlights challenges faced by LLMs in processing Arabic and provides insights into improving LLM performance in Arabic NLP tasks.
A new content improvement system has been developed to address issues of randomness and incorrectness in text generated by deep learning models like GPT-3. The system uses text mining to identify correct sentences and employs syntactic/semantic generalization to substitute problematic elements. The system can substantially improve the factual correctness and meaningfulness of raw content. Why it matters: Improving the quality of automatically generated content is crucial for ensuring reliability and trustworthiness across various AI applications.
This paper introduces an AI framework for autonomous assessment of student work, addressing policy gaps in academic practices. A survey of 117 academics from the UK, UAE, and Iraq reveals positive attitudes toward AI in education, particularly for autonomous assessment. The study also highlights a lack of awareness of modern AI tools among experienced academics, emphasizing the need for updated policies and training.
This paper introduces rational counterfactuals, a method for identifying counterfactuals that maximize the attainment of a desired consequent. The approach aims to identify the antecedent that leads to a specific outcome for rational decision-making. The theory is applied to identify variable values that contribute to peace, such as Allies, Contingency, Distance, Major Power, Capability, Democracy, and Economic Interdependency. Why it matters: The research provides a framework for analyzing and promoting conditions conducive to peace using counterfactual reasoning.
A new methodology emulating fact-checker criteria assesses news outlet factuality and bias using LLMs. The approach uses prompts based on fact-checking criteria to elicit and aggregate LLM responses for predictions. Experiments demonstrate improvements over baselines, with error analysis on media popularity and region, and a released dataset/code at https://github.com/mbzuai-nlp/llm-media-profiling.
Ted Briscoe from the University of Cambridge discussed using machine learning and NLP to develop learning-oriented assessment (LOA) for non-native writers. The technology is used in Cambridge English courseware like Empower and Linguaskill, as well as Write and Improve. Briscoe is also the co-founder and CEO of iLexIR Ltd. Why it matters: Improving automated language assessment could significantly enhance online language learning platforms in the Arab world and beyond.
This paper investigates the intrinsic self-correction capabilities of LLMs, identifying model confidence as a key latent factor. Researchers developed an "If-or-Else" (IoE) prompting framework to guide LLMs in assessing their own confidence and improving self-correction accuracy. Experiments demonstrate that the IoE-based prompt enhances the accuracy of self-corrected responses, with code available on GitHub.