RT info:eu-repo/semantics/article T1 When is resampling beneficial for feature selection with imbalanced wide data? A1 Ramos Pérez, Ismael A1 Arnaiz González, Álvar A1 Rodríguez Diez, Juan José A1 García Osorio, César K1 Feature selection K1 Wide data K1 High dimensional data K1 Very low sample size K1 Unbalanced K1 Machine learning K1 Informática K1 Computer science AB This paper studies the effects that combinations of balancing and feature selection techniques have on wide data (many more attributes than instances) when different classifiers are used. For this, an extensive study is done using 14 datasets, 3 balancing strategies, and 7 feature selection algorithms. The evaluation is carried out using 5 classification algorithms, analyzing the results for different percentages of selected features, and establishing the statistical significance using Bayesian tests.Some general conclusions of the study are that it is better to use RUS before the feature selection, while ROS and SMOTE offer better results when applied afterwards. Additionally, specific results are also obtained depending on the classifier used, for example, for Gaussian SVM the best performance is obtained when the feature selection is done with SVM-RFE before balancing the data with RUS. PB Elsevier SN 0957-4174 YR 2022 FD 2022-02 LK http://hdl.handle.net/10259/6192 UL http://hdl.handle.net/10259/6192 LA eng NO “La Caixa” Foundation, under agreement LCF/PR/PR18/51130007. This work was also supported by the Junta de Castilla León under project BU055P20 (JCyL/FEDER, UE) and by the Ministry of Science and Innovation under project PID2020-119894GB-I00, co-financed through European Union FEDER funds . DS Repositorio Institucional de la Universidad de Burgos RD 25-abr-2024