This README file (version 1) was generated on 2025-01-16 by the dataset authors


GENERAL INFORMATION
1. Title of dataset: Original and processed dataset of Batavia and Sarga woven fabric images
2. Autorship

Name: Beatriz Gil Arroyo
Institution:  Grupo de Inteligencia Computacional Aplicada (GICAP), Departamento de Digitalización, Universidad de Burgos
Email: bgarroyo@ubu.es
ORCID: https://orcid.org/0009-0009-8499-093X

Name: Nuria Velasco-Pérez
Institution: Grupo de Inteligencia Computacional Aplicada (GICAP), Departamento de Digitalización, Universidad de Burgos
Email: nuriavp@ubu.es
ORCID: https://orcid.org/0009-0008-2988-757X

Name: Nuño Basurto
Institution: Grupo de Inteligencia Computacional Aplicada (GICAP), Departamento de Digitalización, Universidad de Burgos
Email: nbasurto@ubu.es
ORCID: https://orcid.org/0000-0001-7289-4689

Name: Juan Marcos Sanz
Institution: Textil Santanderina N-634, km 43 Cabezón de la Sal, 39500, Spain
Email: juanmarcos@tsanta.es
ORCID: https://orcid.org/0000-0002-2024-9909

Name: Angel Arroyo
Institution:  Grupo de Inteligencia Computacional Aplicada (GICAP), Departamento de Digitalización, Universidad de Burgos
email:aarroyop@ubu.es 
ORCID:https://orcid.org/0000-0002-1614-9075

Name: Daniel Urda
Institution:  Grupo de Inteligencia Computacional Aplicada (GICAP), Departamento de Digitalización, Universidad de Burgos
email:durda@ubu.es 
ORCID:https://orcid.org/0000-0003-2662-798X

Name: Álvaro Herrero
Institution: Grupo de Inteligencia Computacional Aplicada (GICAP), Departamento de Digitalización, Universidad de Burgos
Email: ahcosio@ubu.es
ORCID: https://orcid.org/0000-0002-2444-5384

DESCRIPTION
-----------
1. Dataset language
English

2. Abstract
The dataset contains images of Batavia and Sarga woven fabrics. It is designed for training AI models for defect detection and quality assessment.

3. Keywords
Textile defect detection, Fabric quality control Industry 4.0, Deep Learning in manufacturing, Convolutional neural networks, Image analysis and classification.

4. Date of data collection
November, 2022

5. Date of dataset publication
January, 2025

6. Funding
The funding for this project was provided by the DECENT (Deep Learning for automatic Textile Inspection) initiative under the DIH-World 2nd Open Call framework. The authors express their gratitude to INADE for their collaboration in acquiring the images.

7. Geographic location/s of data collection
Textil Santanderina N-634, km 43 Cabezón de la Sal, 39500, Spain


ACCESS INFORMATION
------------------
1. Dataset Creative Commons License

2. Dataset DOI

3. Related publication
(Work in process)


METHODOLOGICAL INFORMATION
--------------------------
Textil Santanderina has equipped its facility with a Basler raL camera featuring the Awaiba DR-12k-3.5 CMOS sensor, capable of delivering 8 kHz at a resolution of 12k. Operating within the VIS-NIR bandwidth, the camera is equipped with Basler proprietary optics, enabling a resolution of 10 px/mm. For uniform illumination, a LED array emitting at 850nm, spanning a field of 15mm x1510mm and comprising 360 infrared (IR) 850nm LEDs, is positioned 20 cm above the fabric at a 15º angle of incidence. This infrared (IR) illumination ensures compatibility with existing D65 standard illuminating systems for visual inspection. The camera is synchronized with the fabric using a 10-bit inductive encoder to ensure high precision (<1/10 mm) positioning. Lighting conditions, setup geometry, optics, and CMOS sensor specifications have been carefully chosen to meet additional requirements for defect inspection, including fabric width, batch length, and inspection speed, among others.

The experimentation has been conducted using two different yarns (Batavia and Sarga) in the research context. These textiles were selected due to their distinctive characteristics and relevance in various textile applications. The
inclusion of both textiles in the study provides a comprehensive assessment of the efficacy of the proposed methods across a broader range of textile inspection scenarios. Additionally, it allows for comparing and contrasting
the performance of machine learning algorithms under variable conditions, enriching the understanding of their ability to address specific challenges in the textile industry.

Initial image preprocessing was conducted wherein the original 16-bit images underwent rescaling to their corresponding 8-bit versions. This transformation reduced the potential pixel values per image from 65,536 to 256. Subsequently, to facilitate meticulous fault detection and expand the dataset, the 8-bit 2048X696 images were partitioned into twelve segments, each measuring 365x365 pixels, with minor overlapping sections along both horizontal and vertical axes.


FILE INFORMATION
-----------------

The folders in the dataset have been structured by the type of woven:
- BATAVIA: contains the images of Batavia yarns.
- SARGA: contains the images of Sarga yarns.

All images are in PNG format.

Within each Batavia and Sarga folder, the following directory structure is present:

- Originals: contains the 2048x696 8-bit images.

- Patches: contains the 365x365 cropped images with small overlapping areas on both the horizontal and vertical axis, further classified into the following folders:
	-Cases: contains patches with defects.
	-Controls: contains patches without defects.
	-info_patches is a CSV file that contains 4 columns: 
		-image_name: the name of the original image from which the patch was derived.
		-patch_name: the name of the patch.
		-ground_truth: the actual condition as labeled by experts (0 if no defect, 1 if defect present).
		-operator_label: the operator´s output used as the baseline (0 if no defect, 1 if defect present), used as the benchmark to evaluate AI models. 

The PNG files in the dataset have been named:
original image: numberid_originalFolder_8bit.png=original_name
	-numberid is a string that identifies the image.
	-originalFolder is a number (0,1,2) that represents the original folder where Textil Santanderina sent the images.
patches: originalname+row_column+.png
	-originalname is the name of the large image from which the patch originates.
	-row and column corresponding to the 2x6 matrix obtained by partitioning the original image into 12 segments of 365x365.

Number of PNG files

BATAVIA\originals-->		2,755 PNG files
BATAVIA\patches\cases-->	8,782 PNG files
BATAVIA\patches\controls--> 	19,911 PNG files
Note: In Batavia dataset, the total number of patches does not equal the number of originals multiplied by 12 due to a review by Textil Santanderina. During this process, 4,367 ambiguous patches were excluded as their labels (case/control) were unclear.


SARGA\originals-->		1,548 PNG files
SARGA\patches\cases-->		173 PNG files
SARGA\patches\controls--> 	18,403 PNG files
Note: In Sarga dataset, the total number of patches equals the number of originals multiplied by 12.