When is resampling beneficial for feature selection with imbalanced wide data?

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10259/6192

Título

When is resampling beneficial for feature selection with imbalanced wide data?

Autor

Ramos Pérez, Ismael

Orcid

Arnaiz González, Álvar

Orcid

Rodríguez Diez, Juan José

Orcid

García Osorio, César

Orcid

Publicado en

Expert Systems with Applications. 2022, V. 188, 116015

Editorial

Elsevier

Fecha de publicación

2022-02

ISSN

0957-4174

DOI

10.1016/j.eswa.2021.116015

Résumé

This paper studies the effects that combinations of balancing and feature selection techniques have on wide data (many more attributes than instances) when different classifiers are used. For this, an extensive study is done using 14 datasets, 3 balancing strategies, and 7 feature selection algorithms. The evaluation is carried out using 5 classification algorithms, analyzing the results for different percentages of selected features, and establishing the statistical significance using Bayesian tests. Some general conclusions of the study are that it is better to use RUS before the feature selection, while ROS and SMOTE offer better results when applied afterwards. Additionally, specific results are also obtained depending on the classifier used, for example, for Gaussian SVM the best performance is obtained when the feature selection is done with SVM-RFE before balancing the data with RUS.

Palabras clave

Feature selection

Wide data

High dimensional data

Very low sample size

Unbalanced

Machine learning

Materia

Informática

Computer science

URI

http://hdl.handle.net/10259/6192

Versión del editor

https://doi.org/10.1016/j.eswa.2021.116015

Aparece en las colecciones

Attribution-NonCommercial-NoDerivatives 4.0 Internacional

Documento(s) sujeto(s) a una licencia Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 Internacional

Fichier(s) constituant ce document

Nombre:

Ramos-esa_2022.pdf

Tamaño:

1.593Mo

Formato:

Adobe PDF

Afficher la notice complète