Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10259/7326
Título
When is resampling beneficial for feature selection with imbalanced wide data?
Publicado en
Expert Systems with Applications. 2022, V. 188, 116015
Editorial
Elsevier
Fecha de publicación
2022-02
ISSN
0957-4174
DOI
10.1016/j.eswa.2021.116015
Résumé
This paper studies the effects that combinations of balancing and feature selection techniques have on wide
data (many more attributes than instances) when different classifiers are used. For this, an extensive study is
done using 14 datasets, 3 balancing strategies, and 7 feature selection algorithms. The evaluation is carried
out using 5 classification algorithms, analyzing the results for different percentages of selected features, and
establishing the statistical significance using Bayesian tests.
Some general conclusions of the study are that it is better to use RUS before the feature selection, while
ROS and SMOTE offer better results when applied afterwards. Additionally, specific results are also obtained
depending on the classifier used, for example, for Gaussian SVM the best performance is obtained when the
feature selection is done with SVM-RFE before balancing the data with RUS.
Palabras clave
Feature selection
Wide data
High dimensional data
Very low sample size
Unbalanced
Machine learning
Materia
Informática
Computer science
Versión del editor
Aparece en las colecciones
Documento(s) sujeto(s) a una licencia Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 Internacional