RT info:eu-repo/semantics/article
T1 When is resampling beneficial for feature selection with imbalanced wide data?
A1 Ramos Pérez, Ismael
A1 Arnaiz González, Álvar
A1 Rodríguez Diez, Juan José
A1 García Osorio, César
K1 Feature selection
K1 Wide data
K1 High dimensional data
K1 Very low sample size
K1 Unbalanced
K1 Machine learning
K1 Informática
K1 Computer science
AB This paper studies the effects that combinations of balancing and feature selection techniques have on widedata (many more attributes than instances) when different classifiers are used. For this, an extensive study isdone using 14 datasets, 3 balancing strategies, and 7 feature selection algorithms. The evaluation is carriedout using 5 classification algorithms, analyzing the results for different percentages of selected features, andestablishing the statistical significance using Bayesian tests.Some general conclusions of the study are that it is better to use RUS before the feature selection, whileROS and SMOTE offer better results when applied afterwards. Additionally, specific results are also obtaineddepending on the classifier used, for example, for Gaussian SVM the best performance is obtained when thefeature selection is done with SVM-RFE before balancing the data with RUS.
PB Elsevier
SN 0957-4174
YR 2022
FD 2022-02
LK http://hdl.handle.net/10259/7326
UL http://hdl.handle.net/10259/7326
LA eng
NO The project leading to these results has received funding from “la Caixa” Foundation, under agreement LCF/PR/PR18/51130007. This work was also supported by the Junta de Castilla   León under project BU055P20 (JCyL/FEDER, UE) and by the Ministry of Science and Innovation under project PID2020-119894GB-I00, co-financed through European Union FEDER funds.
DS Repositorio Institucional de la Universidad de Burgos
RD 09-may-2024