This README file (version 1) was generated on 2025-01-16 by the dataset authors GENERAL INFORMATION 1. Title of dataset: Original and processed dataset of Batavia and Sarga woven fabric images 2. Autorship Name: Beatriz Gil Arroyo Institution: Grupo de Inteligencia Computacional Aplicada (GICAP), Departamento de Digitalización, Universidad de Burgos Email: bgarroyo@ubu.es ORCID: https://orcid.org/0009-0009-8499-093X Name: Nuria Velasco-Pérez Institution: Grupo de Inteligencia Computacional Aplicada (GICAP), Departamento de Digitalización, Universidad de Burgos Email: nuriavp@ubu.es ORCID: https://orcid.org/0009-0008-2988-757X Name: Nuño Basurto Institution: Grupo de Inteligencia Computacional Aplicada (GICAP), Departamento de Digitalización, Universidad de Burgos Email: nbasurto@ubu.es ORCID: https://orcid.org/0000-0001-7289-4689 Name: Juan Marcos Sanz Institution: Textil Santanderina N-634, km 43 Cabezón de la Sal, 39500, Spain Email: juanmarcos@tsanta.es ORCID: https://orcid.org/0000-0002-2024-9909 Name: Angel Arroyo Institution: Grupo de Inteligencia Computacional Aplicada (GICAP), Departamento de Digitalización, Universidad de Burgos email:aarroyop@ubu.es ORCID:https://orcid.org/0000-0002-1614-9075 Name: Daniel Urda Institution: Grupo de Inteligencia Computacional Aplicada (GICAP), Departamento de Digitalización, Universidad de Burgos email:durda@ubu.es ORCID:https://orcid.org/0000-0003-2662-798X Name: Álvaro Herrero Institution: Grupo de Inteligencia Computacional Aplicada (GICAP), Departamento de Digitalización, Universidad de Burgos Email: ahcosio@ubu.es ORCID: https://orcid.org/0000-0002-2444-5384 DESCRIPTION ----------- 1. Dataset language English 2. Abstract The dataset contains images of Batavia and Sarga woven fabrics. It is designed for training AI models for defect detection and quality assessment. 3. Keywords Textile defect detection, Fabric quality control Industry 4.0, Deep Learning in manufacturing, Convolutional neural networks, Image analysis and classification. 4. Date of data collection November, 2022 5. Date of dataset publication January, 2025 6. Funding The funding for this project was provided by the DECENT (Deep Learning for automatic Textile Inspection) initiative under the DIH-World 2nd Open Call framework. The authors express their gratitude to INADE for their collaboration in acquiring the images. 7. Geographic location/s of data collection Textil Santanderina N-634, km 43 Cabezón de la Sal, 39500, Spain ACCESS INFORMATION ------------------ 1. Dataset Creative Commons License 2. Dataset DOI 3. Related publication (Work in process) METHODOLOGICAL INFORMATION -------------------------- Textil Santanderina has equipped its facility with a Basler raL camera featuring the Awaiba DR-12k-3.5 CMOS sensor, capable of delivering 8 kHz at a resolution of 12k. Operating within the VIS-NIR bandwidth, the camera is equipped with Basler proprietary optics, enabling a resolution of 10 px/mm. For uniform illumination, a LED array emitting at 850nm, spanning a field of 15mm x1510mm and comprising 360 infrared (IR) 850nm LEDs, is positioned 20 cm above the fabric at a 15º angle of incidence. This infrared (IR) illumination ensures compatibility with existing D65 standard illuminating systems for visual inspection. The camera is synchronized with the fabric using a 10-bit inductive encoder to ensure high precision (<1/10 mm) positioning. Lighting conditions, setup geometry, optics, and CMOS sensor specifications have been carefully chosen to meet additional requirements for defect inspection, including fabric width, batch length, and inspection speed, among others. The experimentation has been conducted using two different yarns (Batavia and Sarga) in the research context. These textiles were selected due to their distinctive characteristics and relevance in various textile applications. The inclusion of both textiles in the study provides a comprehensive assessment of the efficacy of the proposed methods across a broader range of textile inspection scenarios. Additionally, it allows for comparing and contrasting the performance of machine learning algorithms under variable conditions, enriching the understanding of their ability to address specific challenges in the textile industry. Initial image preprocessing was conducted wherein the original 16-bit images underwent rescaling to their corresponding 8-bit versions. This transformation reduced the potential pixel values per image from 65,536 to 256. Subsequently, to facilitate meticulous fault detection and expand the dataset, the 8-bit 2048X696 images were partitioned into twelve segments, each measuring 365x365 pixels, with minor overlapping sections along both horizontal and vertical axes. FILE INFORMATION ----------------- The folders in the dataset have been structured by the type of woven: - BATAVIA: contains the images of Batavia yarns. - SARGA: contains the images of Sarga yarns. All images are in PNG format. Within each Batavia and Sarga folder, the following directory structure is present: - Originals: contains the 2048x696 8-bit images. - Patches: contains the 365x365 cropped images with small overlapping areas on both the horizontal and vertical axis, further classified into the following folders: -Cases: contains patches with defects. -Controls: contains patches without defects. -info_patches is a CSV file that contains 4 columns: -image_name: the name of the original image from which the patch was derived. -patch_name: the name of the patch. -ground_truth: the actual condition as labeled by experts (0 if no defect, 1 if defect present). -operator_label: the operator´s output used as the baseline (0 if no defect, 1 if defect present), used as the benchmark to evaluate AI models. The PNG files in the dataset have been named: original image: numberid_originalFolder_8bit.png=original_name -numberid is a string that identifies the image. -originalFolder is a number (0,1,2) that represents the original folder where Textil Santanderina sent the images. patches: originalname+row_column+.png -originalname is the name of the large image from which the patch originates. -row and column corresponding to the 2x6 matrix obtained by partitioning the original image into 12 segments of 365x365. Number of PNG files BATAVIA\originals--> 2,755 PNG files BATAVIA\patches\cases--> 8,782 PNG files BATAVIA\patches\controls--> 19,911 PNG files Note: In Batavia dataset, the total number of patches does not equal the number of originals multiplied by 12 due to a review by Textil Santanderina. During this process, 4,367 ambiguous patches were excluded as their labels (case/control) were unclear. SARGA\originals--> 1,548 PNG files SARGA\patches\cases--> 173 PNG files SARGA\patches\controls--> 18,403 PNG files Note: In Sarga dataset, the total number of patches equals the number of originals multiplied by 12.