This README file was generated on 2026-03-20 by the dataset authors.
 
Title: Labeled IoT Window-Based Random Network Pattern Dataset for Reinforcement Learning

------------------------------------------------------------------------

 GENERAL INFORMATION

 1. Authors

César Rodríguez Villagrá\*\
Grupo de Inteligencia Computacional Aplicada (GICAP),\
Departamento de Digitalización, Escuela Politécnica Superior,\
Universidad de Burgos, Avda. Cantabria s/n, 09006 Burgos, Spain\
Email: crvillagra@ubu.es ORCID: https://orcid.org/0009-0007-0122-7417

Sergio Martin Reizabal\
Grupo de Inteligencia Computacional Aplicada (GICAP),\
Departamento de Digitalización, Escuela Politécnica Superior,\
Universidad de Burgos, Avda. Cantabria s/n, 09006 Burgos, Spain\
Email: smreizabal@ubu.es ORCID: https://orcid.org/0009-0001-3284-9100

Ruben Ruiz-Gonzalez\
Grupo de Investigación en Automatización, Robótica, Control y
Optimización (ARCO), Departamento de Digitalización, Escuela Politécnica
Superior,\
Universidad de Burgos, Avda. Cantabria s/n, Burgos, 09006, Spain\
Email: ruben.ruiz@ubu.es ORCID: https://orcid.org/0000-0001-7006-2395

Nuño Basurto\
Grupo de Inteligencia Computacional Aplicada (GICAP),\
Departamento de Digitalización, Escuela Politécnica Superior,\
Universidad de Burgos, Avda. Cantabria s/n, 09006 Burgos, Spain\
Email: nbasurto@ubu.es ORCID: https://orcid.org/0000-0001-7289-4689

Álvaro Herrero\
Grupo de Inteligencia Computacional Aplicada (GICAP),\
Departamento de Digitalización, Escuela Politécnica Superior,\
Universidad de Burgos, Avda. Cantabria s/n, 09006 Burgos, Spain\
Email: ahcosio@ubu.es ORCID: https://orcid.org/0000-0002-2444-5384

\*Corresponding author

------------------------------------------------------------------------

 2. Related publications

The datasets published here are derived from the original dataset by
Neto, E. C. P., Dadkhah, S., Ferreira, R., Zohourian, A., Lu, R., &
Ghorbani, A. A. (2023). CICIoT2023: A Real-Time Dataset and Benchmark
for Large-Scale Attacks in IoT Environment. Sensors, 23(13), 5941.
<https://doi.org/10.3390/s23135941>, where we selected specific PCAP
files (DDoS-TCP_Flood and Benign) and processed them according to our
methodology. All modifications and processing steps are documented in
this repository. The original work is licensed under the Creative
Commons Attribution 4.0 International (CC BY 4.0) license.

------------------------------------------------------------------------

 3. Funding

This publication is part of the AI4SECIoT project ("Artificial
Intelligence for Securing IoT Devices"), funded by the National
Cybersecurity Institute (INCIBE), derived from a collaboration agreement
signed between the National Institute of Cybersecurity (INCIBE) and the
University of Burgos. This initiative is carried out within the
framework of the Recovery, Transformation and Resilience Plan funds,
financed by the European Union (Next Generation), the project of the
Government of Spain that outlines the roadmap for the modernization of
the Spanish economy, the recovery of economic growth and job creation,
for solid, inclusive and resilient economic reconstruction after the
COVID19 crisis, and to respond to the challenges of the next decade.

------------------------------------------------------------------------

 DESCRIPTION

 1. Dataset language

English

------------------------------------------------------------------------

 2. Abstract

This dataset is designed to support the training and evaluation of
reinforcement learning models in the context of network traffic
analysis. It is derived from an existing IoT network traffic dataset,
from which packet capture (pcap) files were selected and processed
following a custom methodology explained in [Methodological
Information](methodological-information). The resulting data
representation is based on a windowing approach, where network traffic
is segmented into fixed-size temporal windows.

Each window aggregates traffic instances and is labeled according to its
composition as benign, attack, or mixed (containing both benign and
malicious activity). The final datasets are generated through random
combinations of these windows, enabling the creation of diverse traffic
patterns that better reflect dynamic and random network conditions.

This structure facilitates the use of the dataset in reinforcement
learning scenarios, where agents must learn to identify, classify, or
respond to varying traffic behaviors over time. Additionally, the
evaluation datasets are generated following the same methodology as the
training datasets, but are kept separate and are not used during the
training process, allowing for an independent evaluation of model
performance.

------------------------------------------------------------------------

 3. Keywords

Internet of Things; Cybersecurity; Attack Traffic; Benign Traffic;
Network; DDoS; DoS; Reinforcement Learning;

------------------------------------------------------------------------

 ACCESS INFORMATION

 1. License

Creative Commons Attribution 4.0 International (CC BY 4.0)

------------------------------------------------------------------------

 2. Repository and persistent identifier

Repository name: Labeled IoT Window-Based Random Network Pattern Dataset
for Reinforcement Learning.

Persistent identifier (DOI): https://doi.org/10.71486/pzvm-3z31

Direct URL: https://hdl.handle.net/10259/11497

Related publication: 
------------------------------------------------------------------------

 METHODOLOGICAL INFORMATION

 1. Capture environment

The original data was obtained from a publicly available IoT network
traffic dataset (CICIoT2023), captured under realistic conditions
including both benign activity and multiple attack scenarios. Traffic
was recorded in pcap format, preserving full packet-level information.

For this work, a subset of the original pcap files was selected without
modification and used as the basis for further processing. The selection
includes both benign and malicious traffic to ensure representative
network behavior for subsequent dataset generation.

------------------------------------------------------------------------

 2. Data processing stages

1.  Extract DDoS-TCP_Flood and Benign PCAPS from the CICIoT2023 dataset.
2.  Drop the non IPv4 packets.
3.  Merge each pcaps set, merging into a single PCAP file for
    DDoS-TCP_Flood and 1 for Benign.
4.  Keep only the incoming traffic to the network, in this case the
    packets that the IP destination starts with 192.168.
5.  Keep only the packets in which the source is on the top 3 most
    frequent source IP addresses.
6.  Keep only the packets in which the destination is on the top 1
    destinations.
7.  Group in windows of 0.01 seconds.
8.  Remove the windows that don't have packets or have a packet count
    lower than the 5% of the max value between mean and median, in order
    to reduce the presence of low-activity or non-informative windows.
9.  Generate the final datasets. For each window, a set is randomly
    chosen with equal probability from the previously generated types:
    benign, attacker, or mixed. A window is then randomly selected from
    the chosen set. In the case of mixed sets, one benign window and one
    attacker window are randomly selected and combined, preserving their
    relative temporal order. Each generated dataset contains 100
    windows.

------------------------------------------------------------------------

 3. Intended use

The dataset is intended for research on reinforcement learning on IoT
networks for mitigation of DoS/DDoS attacks. Including a set of training
and another set for evaluation of the results, to prevent generating
metrics with a dataset on which the model has been trained.

------------------------------------------------------------------------

 4. Limitations

It is important to note that the random windowing methodology described
is not intended to be universally applicable across all machine learning
approaches. Instead, it is specifically designed for reinforcement
learning settings, where the stochastic construction of windows promotes
better generalization to previously unseen states or value
distributions.

------------------------------------------------------------------------

 DIRECTORY AND FILE INFORMATION

 Repository structure

train/ evaluation/

------------------------------------------------------------------------

 File information

Each file contains packet information in each row, having the normalized
source and destination IP (each column instead of have IP address have a
int starting by 0), the TTL of the packet, the size, the relative time
of the packet and the label (type).

  --------------------------------------------------------------------------
  Column   Type       Description
  -------- ---------- ------------------------------------------------------
  ip_src   int        Source IP: Number of the host that originated the
                      packet

  ip_dst   int        Destination IP: Number of the host receiving the
                      packet

  ttl      int 0-255  Time to Live (TTL): Maximum number of hops before
                      packet discard

  size     int        Size: Total size of the packet including headers and
           20-65535   payload (bytes)

  t_rel    float      Relative time: Time measured from start point

  type     string     Type of the packet: attack or benign
  --------------------------------------------------------------------------

 Directory descriptions

Parquet files are a column-oriented file format to store efficiently
data with compression and encoding.

Each dataset is labeled as:

dataset\<Id\>\_dst\<NumberDestinations\>\_src\<NumberSources\>.parquet

Where NumberDestinations is 1 and NumberSources is 3, as previously
mentioned.

**train/**\
200 parquet files used for train, with id from 0 to 199.

           unique   top
  -------- -------- --------
  ip_src   3        0
  ip_dst   1        0
  type     2        attack

          mean     std      median
  ------- -------- -------- --------
  ttl     68.34    26.61    64
  size    60.19    7.39     60
  t_rel   247.28   144.67   245

2044557 total packet count.

**evaluation/**\
100 parquet files used for evaluation of the model and metrics
generation, with id from 0 to 99.

           unique   top
  -------- -------- --------
  ip_src   3        0
  ip_dst   1        0
  type     2        attack

          mean     std      median
  ------- -------- -------- --------
  ttl     68.24    26.33    64
  size    60.17    5.58     60
  t_rel   249.96   144.26   250

1032562 total packet count.

 Dataset Version

1.0

------------------------------------------------------------------------

 CONTACT

For further information, please contact:

César Rodríguez Villagrá Email: crvillagra@ubu.es