Zur Kurzanzeige

dc.contributor.authorRamos Pérez, Ismael 
dc.contributor.authorArnaiz González, Álvar 
dc.contributor.authorRodríguez Diez, Juan José 
dc.contributor.authorGarcía Osorio, César 
dc.date.accessioned2021-11-19T10:18:52Z
dc.date.available2021-11-19T10:18:52Z
dc.date.issued2022-02
dc.identifier.issn0957-4174
dc.identifier.urihttp://hdl.handle.net/10259/6192
dc.description.abstractThis paper studies the effects that combinations of balancing and feature selection techniques have on wide data (many more attributes than instances) when different classifiers are used. For this, an extensive study is done using 14 datasets, 3 balancing strategies, and 7 feature selection algorithms. The evaluation is carried out using 5 classification algorithms, analyzing the results for different percentages of selected features, and establishing the statistical significance using Bayesian tests. Some general conclusions of the study are that it is better to use RUS before the feature selection, while ROS and SMOTE offer better results when applied afterwards. Additionally, specific results are also obtained depending on the classifier used, for example, for Gaussian SVM the best performance is obtained when the feature selection is done with SVM-RFE before balancing the data with RUS.en
dc.description.sponsorship“La Caixa” Foundation, under agreement LCF/PR/PR18/51130007. This work was also supported by the Junta de Castilla León under project BU055P20 (JCyL/FEDER, UE) and by the Ministry of Science and Innovation under project PID2020-119894GB-I00, co-financed through European Union FEDER funds .en
dc.language.isoenges
dc.publisherElsevieres
dc.relation.ispartofExpert Systems with Applications. 2022, V. 188, 116015en
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Internacional*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subjectFeature selectionen
dc.subjectWide dataen
dc.subjectHigh dimensional dataen
dc.subjectVery low sample sizeen
dc.subjectUnbalanceden
dc.subjectMachine learningen
dc.subject.otherInformáticaes
dc.subject.otherComputer scienceen
dc.titleWhen is resampling beneficial for feature selection with imbalanced wide data?en
dc.typeinfo:eu-repo/semantics/articlees
dc.rights.accessRightsinfo:eu-repo/semantics/openAccesses
dc.relation.publisherversionhttps://doi.org/10.1016/j.eswa.2021.116015es
dc.identifier.doi10.1016/j.eswa.2021.116015
dc.relation.projectIDinfo:eu-repo/grantAgreement/Fundación Bancaria Caixa d'Estalvis i Pensions de Barcelona//LCF%2FPR%2FPR18%2F51130007es
dc.relation.projectIDinfo:eu-repo/grantAgreement/Junta de Castilla y León//BU055P20//Métodos y Aplicaciones Industriales del Aprendizaje Semisupervisadoes
dc.relation.projectIDinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-119894GB-I00/ES/APRENDIZAJE AUTOMATICO CON DATOS ESCASAMENTE ETIQUETADOS PARA LA INDUSTRIA 4.0es
dc.type.hasVersioninfo:eu-repo/semantics/publishedVersiones


Dateien zu dieser Ressource

Thumbnail

Das Dokument erscheint in:

Zur Kurzanzeige