This README file (version 1) was generated on 2025-04-21 by the dataset authors GENERAL INFORMATION 1. Title of dataset: Original and processed dataset of malware propagation in IoT networks with a SIR epidemiological model 2. Autorship Name: Leticia Sainz-Villegas Institution: Grupo de Inteligencia Computacional Aplicada (GICAP), Departamento de Matemáticas y Computación, Universidad de Burgos Email: lsainz@ubu.es ORCID: - Name: Roberto Casado-Vara Institution: Grupo de Inteligencia Computacional Aplicada (GICAP), Departamento de Matemáticas y Computación, Universidad de Burgos Email: rccasado@ubu.es ORCID: https://orcid.org/0000-0003-0198-696X Name: Nuño Basurto Institution: Grupo de Inteligencia Computacional Aplicada (GICAP), Departamento de Digitalización, Universidad de Burgos Email: nbasurto@ubu.es ORCID: https://orcid.org/0000-0001-7289-4689 Name: Carlos Cambra Institution: Grupo de Inteligencia Computacional Aplicada (GICAP), Departamento de Digitalización, Universidad de Burgos Email: ccbaseca@ubu.es ORCID: https://orcid.org/0000-0001-5567-9194 Name: Daniel Urda Institution: Grupo de Inteligencia Computacional Aplicada (GICAP), Departamento de Digitalización, Universidad de Burgos email:durda@ubu.es ORCID:https://orcid.org/0000-0003-2662-798X Name: Álvaro Herrero Institution: Grupo de Inteligencia Computacional Aplicada (GICAP), Departamento de Digitalización, Universidad de Burgos Email: ahcosio@ubu.es ORCID: https://orcid.org/0000-0002-2444-5384 DESCRIPTION ----------- 1. Dataset language English 2. Abstract The dataset contains the data generated by an individual SIR model in an IoT network simulated by a graph for 20 time steps. This dataset is designed for training graph-based AI models for malware propagation detection in IoT networks. 3. Keywords Mathematical epidemiology, graph theory, malware propagation, data science, IoT network. 4. Date of data collection April, 2025 5. Date of dataset publication April, 2025 6. Funding This publication is part of the AI4SECIoT project ("Artificial Intelligence for Securing IoT Devices"), funded by the National Cybersecurity Institute (INCIBE), derived from a collaboration agreement signed between the National Institute of Cybersecurity (INCIBE) and the University of Burgos. This initiative is carried out within the framework of the Recovery, Transformation and Resilience Plan funds, financed by the European Union (Next Generation), the project of the Government of Spain that outlines the roadmap for the modernization of the Spanish economy, the recovery of economic growth and job creation, for solid, inclusive and resilient economic reconstruction after the COVID19 crisis, and to respond to the challenges of the next decade. 7. Geographic location/s of data collection Escuela Politécnica Superior. Campus Río Vena. Avda. Cantabria, s/n. 09006 Burgos (Burgos), Spain ACCESS INFORMATION ------------------ 1. Dataset Creative Commons License CC BY 2. Dataset DOI: https://doi.org/10.71486/r4de-dj18 3. Related publication Sainz-Villegas, L., Casado-Vara, R., Basurto, N., Cambra, C., Urda, D., & Herrero, A. (2024, October). Understanding Malware Dynamics in IoT Networks: Dataset Construction Using Mathematical Epidemiology and Complex Networks. In International Conference on EUropean Transnational Education (pp. 237-246). Cham: Springer Nature Switzerland. METHODOLOGICAL INFORMATION -------------------------- To generate the dataset, an IoT network is modeled as a graph where nodes represent devices and edges represent communication links. The network topology is created using models like random geometric graphs or scale-free networks to simulate realistic IoT environments. On this graph, we simulate malware propagation using an individual-based SIR (Susceptible-Infected-Recovered) model over 20 discrete time steps and graph nodes are 50. Initially, a small subset of nodes is set as infected, while the rest are susceptible. At each time step, infected nodes can transmit the malware to their neighbors with a fixed transmission probability (beta), and recover with a recovery probability (gamma). At every time step, the state of each node (S, I, or R) is recorded, along with the structure of the graph. This produces a time series of graph-structured data suitable for training graph neural networks (GNNs) in tasks such as malware detection and prediction of infection spread. FILE INFORMATION ----------------- - Dataset Contents - CSV Files (Raw simulation data): - SIR_dataset_b05_g03.csv - SIR_dataset_b06_g02.csv - SIR_dataset_b08_g015.csv Each file corresponds to a specific simulation with given parameters: - b (beta): infection rate - g (gamma): recovery rate i.e, SIR_dataset_b05_g03.csv means beta=0,5 and gamma=0,3 Rows represent nodes (IoT devices), and columns represent time steps. The cell values denote the infection state of each node at each time: - 0 -> Susceptible (blue) - 1 -> Infected (red) - 2 -> Recovered (green) - PNG Files (Visualizations): - Overview_SIR_dataset_b05_g03.png - Overview_SIR_dataset_b06_g02.png - Overview_SIR_dataset_b08_g015.png These heatmap-like images visualize the evolution of each node’s state over time for the corresponding simulation. Color codes: - Blue: Susceptible - Red: Infected - Green: Recovered