When is resampling beneficial for feature selection with imbalanced wide data?

Ramos Pérez, Ismael; Arnaiz González, Álvar; Rodríguez Diez, Juan José; García Osorio, César

doi:10.1016/j.eswa.2021.116015

dc.contributor.author	Ramos Pérez, Ismael
dc.contributor.author	Arnaiz González, Álvar
dc.contributor.author	Rodríguez Diez, Juan José
dc.contributor.author	García Osorio, César
dc.date.accessioned	2023-01-26T08:51:24Z
dc.date.available	2023-01-26T08:51:24Z
dc.date.issued	2022-02
dc.identifier.issn	0957-4174
dc.identifier.uri	http://hdl.handle.net/10259/7326
dc.description.abstract	This paper studies the effects that combinations of balancing and feature selection techniques have on wide data (many more attributes than instances) when different classifiers are used. For this, an extensive study is done using 14 datasets, 3 balancing strategies, and 7 feature selection algorithms. The evaluation is carried out using 5 classification algorithms, analyzing the results for different percentages of selected features, and establishing the statistical significance using Bayesian tests. Some general conclusions of the study are that it is better to use RUS before the feature selection, while ROS and SMOTE offer better results when applied afterwards. Additionally, specific results are also obtained depending on the classifier used, for example, for Gaussian SVM the best performance is obtained when the feature selection is done with SVM-RFE before balancing the data with RUS.	en
dc.description.sponsorship	The project leading to these results has received funding from “la Caixa” Foundation, under agreement LCF/PR/PR18/51130007. This work was also supported by the Junta de Castilla León under project BU055P20 (JCyL/FEDER, UE) and by the Ministry of Science and Innovation under project PID2020-119894GB-I00, co-financed through European Union FEDER funds.	en
dc.format.mimetype	application/pdf
dc.language.iso	eng	es
dc.publisher	Elsevier	es
dc.relation.ispartof	Expert Systems with Applications. 2022, V. 188, 116015	es
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 Internacional	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	*
dc.subject	Feature selection	en
dc.subject	Wide data	en
dc.subject	High dimensional data	en
dc.subject	Very low sample size	en
dc.subject	Unbalanced	en
dc.subject	Machine learning	en
dc.subject.other	Informática	es
dc.subject.other	Computer science	en
dc.title	When is resampling beneficial for feature selection with imbalanced wide data?	en
dc.type	info:eu-repo/semantics/article	es
dc.rights.accessRights	info:eu-repo/semantics/openAccess	es
dc.relation.publisherversion	https://doi.org/10.1016/j.eswa.2021.116015	es
dc.identifier.doi	10.1016/j.eswa.2021.116015
dc.relation.projectID	info:eu-repo/grantAgreement/Fundación Bancaria Caixa d'Estalvis i Pensions de Barcelona//LCF%2FPR%2FPR18%2F51130007/	es
dc.relation.projectID	info:eu-repo/grantAgreement/Junta de Castilla y León//BU055P20//Métodos y Aplicaciones Industriales del Aprendizaje Semisupervisado/	es
dc.relation.projectID	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-119894GB-I00/ES/APRENDIZAJE AUTOMATICO CON DATOS ESCASAMENTE ETIQUETADOS PARA LA INDUSTRIA 4.0/	es
dc.journal.title	Expert Systems with Applications	es
dc.volume.number	188	es
dc.type.hasVersion	info:eu-repo/semantics/publishedVersion	es

Files in this item

Name:: Ramos-esa_2022.pdf
Size:: 1.593Mb
Format:: Adobe PDF

FilesOpen

This item appears in the following Collection(s)

Show simple item record