dc.contributor.author | Ramos Pérez, Ismael | |
dc.contributor.author | Arnaiz González, Álvar | |
dc.contributor.author | Rodríguez Diez, Juan José | |
dc.contributor.author | García Osorio, César | |
dc.date.accessioned | 2023-01-26T08:51:24Z | |
dc.date.available | 2023-01-26T08:51:24Z | |
dc.date.issued | 2022-02 | |
dc.identifier.issn | 0957-4174 | |
dc.identifier.uri | http://hdl.handle.net/10259/7326 | |
dc.description.abstract | This paper studies the effects that combinations of balancing and feature selection techniques have on wide
data (many more attributes than instances) when different classifiers are used. For this, an extensive study is
done using 14 datasets, 3 balancing strategies, and 7 feature selection algorithms. The evaluation is carried
out using 5 classification algorithms, analyzing the results for different percentages of selected features, and
establishing the statistical significance using Bayesian tests.
Some general conclusions of the study are that it is better to use RUS before the feature selection, while
ROS and SMOTE offer better results when applied afterwards. Additionally, specific results are also obtained
depending on the classifier used, for example, for Gaussian SVM the best performance is obtained when the
feature selection is done with SVM-RFE before balancing the data with RUS. | en |
dc.description.sponsorship | The project leading to these results has received funding from “la Caixa” Foundation, under agreement LCF/PR/PR18/51130007. This work was also supported by the Junta de Castilla León under project BU055P20 (JCyL/FEDER, UE) and by the Ministry of Science and Innovation under project PID2020-119894GB-I00, co-financed through European Union FEDER funds. | en |
dc.format.mimetype | application/pdf | |
dc.language.iso | eng | es |
dc.publisher | Elsevier | es |
dc.relation.ispartof | Expert Systems with Applications. 2022, V. 188, 116015 | es |
dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 Internacional | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | * |
dc.subject | Feature selection | en |
dc.subject | Wide data | en |
dc.subject | High dimensional data | en |
dc.subject | Very low sample size | en |
dc.subject | Unbalanced | en |
dc.subject | Machine learning | en |
dc.subject.other | Informática | es |
dc.subject.other | Computer science | en |
dc.title | When is resampling beneficial for feature selection with imbalanced wide data? | en |
dc.type | info:eu-repo/semantics/article | es |
dc.rights.accessRights | info:eu-repo/semantics/openAccess | es |
dc.relation.publisherversion | https://doi.org/10.1016/j.eswa.2021.116015 | es |
dc.identifier.doi | 10.1016/j.eswa.2021.116015 | |
dc.relation.projectID | info:eu-repo/grantAgreement/Fundación Bancaria Caixa d'Estalvis i Pensions de Barcelona//LCF%2FPR%2FPR18%2F51130007/ | es |
dc.relation.projectID | info:eu-repo/grantAgreement/Junta de Castilla y León//BU055P20//Métodos y Aplicaciones Industriales del Aprendizaje Semisupervisado/ | es |
dc.relation.projectID | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-119894GB-I00/ES/APRENDIZAJE AUTOMATICO CON DATOS ESCASAMENTE ETIQUETADOS PARA LA INDUSTRIA 4.0/ | es |
dc.journal.title | Expert Systems with Applications | es |
dc.volume.number | 188 | es |
dc.type.hasVersion | info:eu-repo/semantics/publishedVersion | es |