This README file (version 1) was generated on 2025-05-19 by the dataset authors. GENERAL INFORMATION 1. Title of dataset: Time Series Sensor Data for Cutting Fluid Analysis from the SmarTaladrine Project 2. Authorship Name: Félix de Miguel Villalba Institution: Grupo de Inteligencia Computacional Aplicada (GICAP), Departamento de Digitalización, Universidad de Burgos Email: fdemiguel@ubu.es ORCID: https://orcid.org/0009-0000-9607-110X Name: Nuria Velasco Pérez Institution: Grupo de Inteligencia Computacional Aplicada (GICAP), Departamento de Digitalización, Universidad de Burgos Email: nuriavp@ubu.es ORCID: https://orcid.org/0009-0008-2988-757X Name: Félix Movilla Alonso Institution: Grupo de Inteligencia Computacional Aplicada (GICAP), Departamento de Digitalización, Universidad de Burgos Email: fmovilla@ubu.es ORCID: https://orcid.org/0009-0009-3142-1426 Name: Carlos Cambra Baseca Institution: Grupo de Inteligencia Computacional Aplicada (GICAP), Departamento de Digitalización, Universidad de Burgos Email: ccbaseca@ubu.es ORCID: https://orcid.org/0000-0001-5567-9194 Name: Daniel Urda Institution: Grupo de Inteligencia Computacional Aplicada (GICAP), Departamento de Digitalización, Universidad de Burgos email:durda@ubu.es ORCID:https://orcid.org/0000-0003-2662-798X Name: Álvaro Herrero Institution: Grupo de Inteligencia Computacional Aplicada (GICAP), Departamento de Digitalización, Universidad de Burgos Email: ahcosio@ubu.es ORCID: https://orcid.org/0000-0002-2444-5384 DESCRIPTION ----------- 1. Dataset language Spanish 2. Abstract This dataset contains multivariate time series data collected from sensors monitoring a cutting fluid (taladrina) test tank as part of the "SmarTaladrine" project. The monitored variables include pH, Temperature, Concentration, and Conductivity. Data spans from February 27, 2025, to April 1, 2025. A gap in the data between March 14 and March 20, 2025, has been imputed using the MOMENT time series foundation model. The dataset is intended for developing and evaluating models for cutting fluid analysis, anomaly detection, and predictive maintenance within industrial machining operations. 3. Keywords Cutting fluid, Taladrina, Time series, Sensors, imputation, Data analysis, Predictive maintenance, Anomaly detection 4. Date of data collection February 27, 2025 - April 1, 2025 5. Date of dataset publication May, 2025 6. Funding This work was supported by the project "Solución innovadora, embebida en máquina o auxiliar, para la eco gestión inteligente de taladrinas en operaciones de mecanizado (SmarTaladrine)", funded by the Programa Estatal para Impulsar la Investigación Científico-Técnica y su Transferencia, del Plan Estatal de Investigación Científica, Técnica y de Innovación 2021-2023, within the framework of the Plan de Recuperación, Transformación y Resiliencia (Spain). 7. Geographic location/s of data collection** HYPATIA GNC ACCESORIOS S.A. C. Condado de Treviño, 53, 09001 Burgos, Spain. ACCESS INFORMATION ------------------ 1. Dataset Creative Commons License 2. Dataset DOI: 3. Related publication (Work in process) METHODOLOGICAL INFORMATION -------------------------- Data was collected from a cutting fluid test tank located at HYPATIA GNC ACCESORIOS S.A. The sensor setup included: - pH: Industrial pH Probe (Atlas Scientific) - Conductivity: Industrial Conductivity Probe k 1.0 - Concentration & Temperature: Atago CM-BASEβ (A/D) refractometer (measuring concentration and temperature between 10°C and 50°C). Raw sensor readings were initially captured in JSON format and subsequently converted to CSV files, preserving the original timestamps. The collected raw data underwent several preprocessing steps: 1. Filtering: Anomalous values were identified and removed using statistical methods such as Interquartile Range (IQR) and Z-score analysis. 2. Resampling: The filtered data was resampled to a consistent 5-minute granularity. The median value within each 5-minute interval was used as the representative value for that interval. 3. Imputation: A significant known gap in the data, spanning from March 14, 2025, to March 20, 2025, was addressed for all four variables (pH, Temperature, Concentration, Conductivity). This primary imputation was performed using the MOMENT (Multivariate Time Series Foundation Model). Specifically, the pre-trained "AutonLab/MOMENT-1-large" model was fine-tuned on the available clean base data segments for each variable independently to fill this extended missing period. (Model Reference DOI: https://doi.org/10.48550/arXiv.2402.03885). Other imputation techniques, including time-series models like ARIMA and deep learning approaches such as LSTM-based Variational Autoencoders (VAEs), were explored and utilized for addressing smaller missing values or for comparative analysis of imputation performance on different types of data gaps. The resulting dataset is designed to facilitate future research in cutting fluid monitoring, anomaly detection, and predictive maintenance strategies. FILE INFORMATION ----------------- The dataset consists of three primary CSV files: 1. measures_part1.csv: - Content: Contains the preprocessed sensor data (filtered and resampled to 5-min granularity) from February 27, 2025, to March 14, 2025. - Columns: Fecha, Hora, pH, Temperatura, Concentración, Conductividad. 2. measures_part2.csv: - Content: Contains the preprocessed sensor data (filtered and resampled to 5-min granularity) from March 20, 2025, to April 1, 2025. - Columns: Fecha, Hora, pH, Temperatura, Concentración, Conductividad. 3. measures_full.csv: - Content: Contains the complete time series from February 27, 2025, to April 1, 2025 (at 5-min granularity). The data gap between March 14 and March 20 has been filled using the imputation models mentioned. - Columns: Fecha, Hora, pH, Temperatura, Concentracion, Conductividad. All CSV files use UTF-8 encoding, and comma (,) as the delimiter. Timestamp information is provided in the Fecha (Date) and Hora (Time) columns. For questions regarding the dataset, please contact the corresponding author(s).