When is resampling beneficial for feature selection with imbalanced wide data?

Ramos Pérez, Ismael; Arnaiz González, Álvar; Rodríguez Diez, Juan José; García Osorio, César

doi:10.1016/j.eswa.2021.116015

dc.contributor.author	Ramos Pérez, Ismael
dc.contributor.author	Arnaiz González, Álvar
dc.contributor.author	Rodríguez Diez, Juan José
dc.contributor.author	García Osorio, César
dc.date.accessioned	2021-11-19T10:18:52Z
dc.date.available	2021-11-19T10:18:52Z
dc.date.issued	2022-02
dc.identifier.issn	0957-4174
dc.identifier.uri	http://hdl.handle.net/10259/6192
dc.description.abstract	This paper studies the effects that combinations of balancing and feature selection techniques have on wide data (many more attributes than instances) when different classifiers are used. For this, an extensive study is done using 14 datasets, 3 balancing strategies, and 7 feature selection algorithms. The evaluation is carried out using 5 classification algorithms, analyzing the results for different percentages of selected features, and establishing the statistical significance using Bayesian tests. Some general conclusions of the study are that it is better to use RUS before the feature selection, while ROS and SMOTE offer better results when applied afterwards. Additionally, specific results are also obtained depending on the classifier used, for example, for Gaussian SVM the best performance is obtained when the feature selection is done with SVM-RFE before balancing the data with RUS.	en
dc.description.sponsorship	“La Caixa” Foundation, under agreement LCF/PR/PR18/51130007. This work was also supported by the Junta de Castilla León under project BU055P20 (JCyL/FEDER, UE) and by the Ministry of Science and Innovation under project PID2020-119894GB-I00, co-financed through European Union FEDER funds .	en
dc.language.iso	eng	es
dc.publisher	Elsevier	es
dc.relation.ispartof	Expert Systems with Applications. 2022, V. 188, 116015	en
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 Internacional	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	*
dc.subject	Feature selection	en
dc.subject	Wide data	en
dc.subject	High dimensional data	en
dc.subject	Very low sample size	en
dc.subject	Unbalanced	en
dc.subject	Machine learning	en
dc.subject.other	Informática	es
dc.subject.other	Computer science	en
dc.title	When is resampling beneficial for feature selection with imbalanced wide data?	en
dc.type	info:eu-repo/semantics/article	es
dc.rights.accessRights	info:eu-repo/semantics/openAccess	es
dc.relation.publisherversion	https://doi.org/10.1016/j.eswa.2021.116015	es
dc.identifier.doi	10.1016/j.eswa.2021.116015
dc.relation.projectID	info:eu-repo/grantAgreement/Fundación Bancaria Caixa d'Estalvis i Pensions de Barcelona//LCF%2FPR%2FPR18%2F51130007	es
dc.relation.projectID	info:eu-repo/grantAgreement/Junta de Castilla y León//BU055P20//Métodos y Aplicaciones Industriales del Aprendizaje Semisupervisado	es
dc.relation.projectID	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-119894GB-I00/ES/APRENDIZAJE AUTOMATICO CON DATOS ESCASAMENTE ETIQUETADOS PARA LA INDUSTRIA 4.0	es
dc.type.hasVersion	info:eu-repo/semantics/publishedVersion	es

Ficheros en este ítem

Nombre:: Ramos-esa_2022.pdf
Tamaño:: 1.593Mb
Formato:: Adobe PDF

Visualizar/Abrir

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem