Search

Results for "dataset"

TII-SSRC-23 Dataset: Typological Exploration of Diverse Traffic Patterns for Intrusion Detection

arXiv · Sep 14

Researchers introduce TII-SSRC-23, a new network intrusion detection dataset designed to improve the diversity and representation of modern network traffic for machine learning models. The dataset includes a range of traffic types and subtypes to address the limitations of existing datasets. Feature importance analysis and baseline experiments for supervised and unsupervised intrusion detection are also provided.

JobArabi: An Arabic Corpus and Analysis of Job Announcements from Social Media

arXiv · May 20

Researchers have introduced JobArabi, a new large-scale corpus consisting of 20,528 Arabic job announcements collected from X between January 2024 and October 2025. The dataset was compiled using a linguistically informed query framework covering various Arabic recruitment expressions, offering metadata like timestamps and geolocation for detailed analysis. Quantitative analysis of JobArabi reveals sociolinguistic patterns, including persistent gendered hiring language, regional occupational demand variations, and emotional framing in recruitment messages. Why it matters: This corpus provides a valuable resource for research in Arabic NLP, computational social science, and digital labor studies, offering unique insights into labor market communication and linguistic change in the Arab world.