2024-03-28T20:13:40Zhttps://riubu.ubu.es/oai/requestoai:riubu.ubu.es:10259/61922022-11-23T11:03:38Zcom_10259_4219com_10259_5086com_10259_2604com_10259_6190com_10259_6189com_10259.4_106col_10259_4220col_10259_6191
Ramos Pérez, Ismael
751
500
Arnaiz González, Álvar
39
600
0000-0001-6965-0237
Rodríguez Diez, Juan José
477
600
García Osorio, César
212
600
0000-0002-1206-1084
2021-11-19T10:18:52Z
2021-11-19T10:18:52Z
2022-02
0957-4174
http://hdl.handle.net/10259/6192
10.1016/j.eswa.2021.116015
This paper studies the effects that combinations of balancing and feature selection techniques have on wide data (many more attributes than instances) when different classifiers are used. For this, an extensive study is done using 14 datasets, 3 balancing strategies, and 7 feature selection algorithms. The evaluation is carried out using 5 classification algorithms, analyzing the results for different percentages of selected features, and establishing the statistical significance using Bayesian tests.
Some general conclusions of the study are that it is better to use RUS before the feature selection, while ROS and SMOTE offer better results when applied afterwards. Additionally, specific results are also obtained depending on the classifier used, for example, for Gaussian SVM the best performance is obtained when the feature selection is done with SVM-RFE before balancing the data with RUS.
“La Caixa” Foundation, under agreement LCF/PR/PR18/51130007. This work was also supported by the Junta de Castilla León under project BU055P20 (JCyL/FEDER, UE) and by the Ministry of Science and Innovation under project PID2020-119894GB-I00, co-financed through European Union FEDER funds .
eng
Elsevier
Expert Systems with Applications. 2022, V. 188, 116015
https://doi.org/10.1016/j.eswa.2021.116015
info:eu-repo/grantAgreement/Fundación Bancaria Caixa d'Estalvis i Pensions de Barcelona//LCF%2FPR%2FPR18%2F51130007
info:eu-repo/grantAgreement/Junta de Castilla y León//BU055P20//Métodos y Aplicaciones Industriales del Aprendizaje Semisupervisado
info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-119894GB-I00/ES/APRENDIZAJE AUTOMATICO CON DATOS ESCASAMENTE ETIQUETADOS PARA LA INDUSTRIA 4.0
Attribution-NonCommercial-NoDerivatives 4.0 Internacional
http://creativecommons.org/licenses/by-nc-nd/4.0/
info:eu-repo/semantics/openAccess
Feature selection
Wide data
High dimensional data
Very low sample size
Unbalanced
Machine learning
Informática
Computer science
When is resampling beneficial for feature selection with imbalanced wide data?
info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
THUMBNAIL
Ramos-esa_2022.pdf.jpg
Ramos-esa_2022.pdf.jpg
IM Thumbnail
image/jpeg
5019
https://riubu.ubu.es/bitstream/10259/6192/4/Ramos-esa_2022.pdf.jpg
40550d68c45bd81eb72b16f426ab1ecf
MD5
4
LICENSE
license.txt
license.txt
text/plain; charset=utf-8
1362
https://riubu.ubu.es/bitstream/10259/6192/3/license.txt
5d013bfa6e473ff0db22cd82a4d71a70
MD5
3
CC-LICENSE
license_rdf
license_rdf
application/rdf+xml; charset=utf-8
805
https://riubu.ubu.es/bitstream/10259/6192/2/license_rdf
4460e5956bc1d1639be9ae6146a50347
MD5
2
ORIGINAL
Ramos-esa_2022.pdf
Ramos-esa_2022.pdf
application/pdf
1670522
https://riubu.ubu.es/bitstream/10259/6192/1/Ramos-esa_2022.pdf
db5c9b6bc3afb58e0032f1517f7fd590
MD5
1
10259/6192
oai:riubu.ubu.es:10259/6192
2022-11-23 12:03:38.378
Repositorio Institucional de la Universidad de Burgos
bubrep@ubu.es
RWwgYXV0b3IgY29tbyDDum5pY28gdGl0dWxhciBkZSBsb3MgZGVyZWNob3MgZGUgcHJvcGllZGFkIGludGVsZWN0dWFsIGRlIGxhIG9icmEsIG8gZGlzcG9uaWVuZG8gZGUgbG9zIGRlYmlkb3MgcGVybWlzb3MgZGUgbG9zIG90cm9zIHRpdHVsYXJlcywgc2kgbG9zIGh1YmllcmEsIHkgZW4gdmlydHVkIGRlIGxvcyBkZXJlY2hvcyBxdWUgbGUgY29uZmllcmUgbGEgbGVnaXNsYWNpw7NuIHZpZ2VudGUgc29icmUgcHJvcGllZGFkIGludGVsZWN0dWFsIHkgZGVyZWNob3MgZGUgYXV0b3IsIApBVVRPUklaQSBhIGxhIFVuaXZlcnNpZGFkIGRlIEJ1cmdvcyBhIGRpZnVuZGlyLCBkZSBtYW5lcmEgZ3JhdHVpdGEsIGVsIGNvbnRlbmlkbyBkZSBsb3MgYXJjaGl2b3MgZGlnaXRhbGVzIHF1ZSBjb3JyZXNwb25kZW4gYWwgZG9jdW1lbnRvIGRlc2NyaXRvIGFudGVyaW9ybWVudGUsIGNvbiBjYXLDoWN0ZXIgbm8gZXhjbHVzaXZvIHkgZGUgbWFuZXJhIHDDumJsaWNhIGVuIGFjY2VzbyBhYmllcnRvIGEgdHJhdsOpcyBkZSBJbnRlcm5ldCwgcGFyYSBsbyBxdWUgbGEgQmlibGlvdGVjYSBwcm9jZWRlcsOhIGEgYXJjaGl2YXJsb3MgZW4gZWwgUmVwb3NpdG9yaW8gSW5zdGl0dWNpb25hbC4gQXNpbWlzbW8gYXV0b3JpemEgYSBsYSBVbml2ZXJzaWRhZCBkZSBCdXJnb3MgYSByZWFsaXphciBsYXMgdHJhbnNmb3JtYWNpb25lcyBuZWNlc2FyaWFzIGRlIGZvcm1hdG8sIG5vIGRlIGNvbnRlbmlkbywgcGFyYSBnYXJhbnRpemFyIGxhIHByZXNlcnZhY2nDs24geSBlbCBhY2Nlc28gZW4gZWwgZnV0dXJvLgoKRWwgYXV0b3IgZGlzcG9uZSwgZW4gdG9kbyBjYXNvLCBkZWwgZGVyZWNobyBhIHJldm9jYXIgZXN0YSBhdXRvcml6YWNpw7NuLgoKTGEgY2VzacOzbiBkZSBkZXJlY2hvcyBkZSBlc3RhIG9icmEgc2UgZW5jdWVudHJhIHN1amV0YSBhIGxhIGxlZ2lzbGFjacOzbiB2aWdlbnRlIHNvYnJlIHByb3BpZWRhZCBpbnRlbGVjdHVhbCB5IGRlcmVjaG9zIGRlIGF1dG9yLiBTdSBkaWZ1c2nDs24gZW4gZWwgUmVwb3NpdG9yaW8gc2Vyw6EgYmFqbyBsYSBtb2RhbGlkYWQgZGUgbGljZW5jaWEgQ3JlYXRpdmUgQ29tbW9ucyBvIGVxdWl2YWxlbnRlOiByZWNvbm9jaW1pZW50byDigJMgdXNvIG5vIGNvbWVyY2lhbCDigJMgc2luIG9icmEgZGVyaXZhZGEsIHBvciBsYSBxdWUgc2UgcGVybWl0ZSBoYWNlciBjb3BpYSwgZGlzdHJpYnVpciB5IGNvbXVuaWNhciBww7pibGljYW1lbnRlIGxhIG9icmEgc2llbXByZSBxdWUgc2UgY2l0ZSBhbCBhdXRvciwgZWwgdXNvIHF1ZSBzZSBoYWdhIGRlIGVsbGEgc2VhIG5vIGNvbWVyY2lhbCB5IG5vIHNlIGNyZWVuIG9icmFzIGRlcml2YWRhcyBhIHBhcnRpciBkZSBsYSBvcmlnaW5hbC4K