This README file was generated on 2026-03-20 by the dataset authors. Title: Labeled IoT Window-Based Random Network Pattern Dataset for Reinforcement Learning ------------------------------------------------------------------------ GENERAL INFORMATION 1. Authors César Rodríguez Villagrá\*\ Grupo de Inteligencia Computacional Aplicada (GICAP),\ Departamento de Digitalización, Escuela Politécnica Superior,\ Universidad de Burgos, Avda. Cantabria s/n, 09006 Burgos, Spain\ Email: crvillagra@ubu.es ORCID: https://orcid.org/0009-0007-0122-7417 Sergio Martin Reizabal\ Grupo de Inteligencia Computacional Aplicada (GICAP),\ Departamento de Digitalización, Escuela Politécnica Superior,\ Universidad de Burgos, Avda. Cantabria s/n, 09006 Burgos, Spain\ Email: smreizabal@ubu.es ORCID: https://orcid.org/0009-0001-3284-9100 Ruben Ruiz-Gonzalez\ Grupo de Investigación en Automatización, Robótica, Control y Optimización (ARCO), Departamento de Digitalización, Escuela Politécnica Superior,\ Universidad de Burgos, Avda. Cantabria s/n, Burgos, 09006, Spain\ Email: ruben.ruiz@ubu.es ORCID: https://orcid.org/0000-0001-7006-2395 Nuño Basurto\ Grupo de Inteligencia Computacional Aplicada (GICAP),\ Departamento de Digitalización, Escuela Politécnica Superior,\ Universidad de Burgos, Avda. Cantabria s/n, 09006 Burgos, Spain\ Email: nbasurto@ubu.es ORCID: https://orcid.org/0000-0001-7289-4689 Álvaro Herrero\ Grupo de Inteligencia Computacional Aplicada (GICAP),\ Departamento de Digitalización, Escuela Politécnica Superior,\ Universidad de Burgos, Avda. Cantabria s/n, 09006 Burgos, Spain\ Email: ahcosio@ubu.es ORCID: https://orcid.org/0000-0002-2444-5384 \*Corresponding author ------------------------------------------------------------------------ 2. Related publications The datasets published here are derived from the original dataset by Neto, E. C. P., Dadkhah, S., Ferreira, R., Zohourian, A., Lu, R., & Ghorbani, A. A. (2023). CICIoT2023: A Real-Time Dataset and Benchmark for Large-Scale Attacks in IoT Environment. Sensors, 23(13), 5941. , where we selected specific PCAP files (DDoS-TCP_Flood and Benign) and processed them according to our methodology. All modifications and processing steps are documented in this repository. The original work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. ------------------------------------------------------------------------ 3. Funding This publication is part of the AI4SECIoT project ("Artificial Intelligence for Securing IoT Devices"), funded by the National Cybersecurity Institute (INCIBE), derived from a collaboration agreement signed between the National Institute of Cybersecurity (INCIBE) and the University of Burgos. This initiative is carried out within the framework of the Recovery, Transformation and Resilience Plan funds, financed by the European Union (Next Generation), the project of the Government of Spain that outlines the roadmap for the modernization of the Spanish economy, the recovery of economic growth and job creation, for solid, inclusive and resilient economic reconstruction after the COVID19 crisis, and to respond to the challenges of the next decade. ------------------------------------------------------------------------ DESCRIPTION 1. Dataset language English ------------------------------------------------------------------------ 2. Abstract This dataset is designed to support the training and evaluation of reinforcement learning models in the context of network traffic analysis. It is derived from an existing IoT network traffic dataset, from which packet capture (pcap) files were selected and processed following a custom methodology explained in [Methodological Information](methodological-information). The resulting data representation is based on a windowing approach, where network traffic is segmented into fixed-size temporal windows. Each window aggregates traffic instances and is labeled according to its composition as benign, attack, or mixed (containing both benign and malicious activity). The final datasets are generated through random combinations of these windows, enabling the creation of diverse traffic patterns that better reflect dynamic and random network conditions. This structure facilitates the use of the dataset in reinforcement learning scenarios, where agents must learn to identify, classify, or respond to varying traffic behaviors over time. Additionally, the evaluation datasets are generated following the same methodology as the training datasets, but are kept separate and are not used during the training process, allowing for an independent evaluation of model performance. ------------------------------------------------------------------------ 3. Keywords Internet of Things; Cybersecurity; Attack Traffic; Benign Traffic; Network; DDoS; DoS; Reinforcement Learning; ------------------------------------------------------------------------ ACCESS INFORMATION 1. License Creative Commons Attribution 4.0 International (CC BY 4.0) ------------------------------------------------------------------------ 2. Repository and persistent identifier Repository name: Labeled IoT Window-Based Random Network Pattern Dataset for Reinforcement Learning. Persistent identifier (DOI): https://doi.org/10.71486/pzvm-3z31 Direct URL: https://hdl.handle.net/10259/11497 Related publication: ------------------------------------------------------------------------ METHODOLOGICAL INFORMATION 1. Capture environment The original data was obtained from a publicly available IoT network traffic dataset (CICIoT2023), captured under realistic conditions including both benign activity and multiple attack scenarios. Traffic was recorded in pcap format, preserving full packet-level information. For this work, a subset of the original pcap files was selected without modification and used as the basis for further processing. The selection includes both benign and malicious traffic to ensure representative network behavior for subsequent dataset generation. ------------------------------------------------------------------------ 2. Data processing stages 1. Extract DDoS-TCP_Flood and Benign PCAPS from the CICIoT2023 dataset. 2. Drop the non IPv4 packets. 3. Merge each pcaps set, merging into a single PCAP file for DDoS-TCP_Flood and 1 for Benign. 4. Keep only the incoming traffic to the network, in this case the packets that the IP destination starts with 192.168. 5. Keep only the packets in which the source is on the top 3 most frequent source IP addresses. 6. Keep only the packets in which the destination is on the top 1 destinations. 7. Group in windows of 0.01 seconds. 8. Remove the windows that don't have packets or have a packet count lower than the 5% of the max value between mean and median, in order to reduce the presence of low-activity or non-informative windows. 9. Generate the final datasets. For each window, a set is randomly chosen with equal probability from the previously generated types: benign, attacker, or mixed. A window is then randomly selected from the chosen set. In the case of mixed sets, one benign window and one attacker window are randomly selected and combined, preserving their relative temporal order. Each generated dataset contains 100 windows. ------------------------------------------------------------------------ 3. Intended use The dataset is intended for research on reinforcement learning on IoT networks for mitigation of DoS/DDoS attacks. Including a set of training and another set for evaluation of the results, to prevent generating metrics with a dataset on which the model has been trained. ------------------------------------------------------------------------ 4. Limitations It is important to note that the random windowing methodology described is not intended to be universally applicable across all machine learning approaches. Instead, it is specifically designed for reinforcement learning settings, where the stochastic construction of windows promotes better generalization to previously unseen states or value distributions. ------------------------------------------------------------------------ DIRECTORY AND FILE INFORMATION Repository structure train/ evaluation/ ------------------------------------------------------------------------ File information Each file contains packet information in each row, having the normalized source and destination IP (each column instead of have IP address have a int starting by 0), the TTL of the packet, the size, the relative time of the packet and the label (type). -------------------------------------------------------------------------- Column Type Description -------- ---------- ------------------------------------------------------ ip_src int Source IP: Number of the host that originated the packet ip_dst int Destination IP: Number of the host receiving the packet ttl int 0-255 Time to Live (TTL): Maximum number of hops before packet discard size int Size: Total size of the packet including headers and 20-65535 payload (bytes) t_rel float Relative time: Time measured from start point type string Type of the packet: attack or benign -------------------------------------------------------------------------- Directory descriptions Parquet files are a column-oriented file format to store efficiently data with compression and encoding. Each dataset is labeled as: dataset\\_dst\\_src\.parquet Where NumberDestinations is 1 and NumberSources is 3, as previously mentioned. **train/**\ 200 parquet files used for train, with id from 0 to 199. unique top -------- -------- -------- ip_src 3 0 ip_dst 1 0 type 2 attack mean std median ------- -------- -------- -------- ttl 68.34 26.61 64 size 60.19 7.39 60 t_rel 247.28 144.67 245 2044557 total packet count. **evaluation/**\ 100 parquet files used for evaluation of the model and metrics generation, with id from 0 to 99. unique top -------- -------- -------- ip_src 3 0 ip_dst 1 0 type 2 attack mean std median ------- -------- -------- -------- ttl 68.24 26.33 64 size 60.17 5.58 60 t_rel 249.96 144.26 250 1032562 total packet count. Dataset Version 1.0 ------------------------------------------------------------------------ CONTACT For further information, please contact: César Rodríguez Villagrá Email: crvillagra@ubu.es