2024-03-29T06:25:07Zhttps://riubu.ubu.es/oai/requestoai:riubu.ubu.es:10259/62062022-11-21T12:51:46Zcom_10259_5377com_10259_5086com_10259_2604col_10259_5378
Juez Gil, Mario
747
500
Arnaiz González, Álvar
39
600
0000-0001-6965-0237
Rodríguez Diez, Juan José
477
600
López Nozal, Carlos
322
600
0000-0001-8462-212X
García Osorio, César
212
600
0000-0002-1206-1084
2021-11-23T08:25:06Z
2021-11-23T08:25:06Z
2021-11
0925-2312
http://hdl.handle.net/10259/6206
10.1016/j.neucom.2021.08.086
One of the main goals of Big Data research, is to find new data mining methods that are able to process large amounts of data in acceptable times. In Big Data classification, as in traditional classification, class imbalance is a common problem that must be addressed, in the case of Big Data also looking for a solution that can be applied in an acceptable execution time. In this paper we present Approx-SMOTE, a parallel implementation of the SMOTE algorithm for the Apache Spark framework. The key difference with the original SMOTE, besides parallelism, is that it uses an approximated version of k-Nearest Neighbor which makes it highly scalable. Although an implementation of SMOTE for Big Data already exists (SMOTE-BD), it uses an exact Nearest Neighbor search, which does not make it entirely scalable. Approx-SMOTE on the other hand is able to achieve up to 30 times faster run times without sacrificing the improved classification performance offered by the original SMOTE.
“La Caixa” Foundation, under agreement LCF/PR/PR18/51130007. This work was supported by the Junta de Castilla y León under project BU055P20 and by the Ministry of Science and Innovation of Spain under project PID2020-119894 GB-I00, co-financed through European Union FEDER funds. It also was supported through Consejería de Educación of the Junta de Castilla y León and the European Social Fund through a pre-doctoral grant (EDU/1100/2017). This material is based upon work supported by Google Cloud.
application/pdf
eng
Elsevier
Neurocomputing. 2021, V. 464, p. 432-437
https://doi.org/10.1016/j.neucom.2021.08.086
info:eu-repo/grantAgreement/Fundación Bancaria Caixa d'Estalvis i Pensions de Barcelona//LCF%2FPR%2FPR18%2F51130007
info:eu-repo/grantAgreement/Junta de Castilla y León//BU055P20//Métodos y Aplicaciones Industriales del Aprendizaje Semisupervisado
info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-119894GB-I00/ES/APRENDIZAJE AUTOMATICO CON DATOS ESCASAMENTE ETIQUETADOS PARA LA INDUSTRIA 4.0
Attribution-NonCommercial-NoDerivatives 4.0 Internacional
http://creativecommons.org/licenses/by-nc-nd/4.0/
info:eu-repo/semantics/openAccess
SMOTE
Imbalance
Spark
Big data
Data mining
Informática
Computer science
Approx-SMOTE: Fast SMOTE for Big Data on Apache Spark
info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
THUMBNAIL
Juez-neurocomputing_2021.pdf.jpg
Juez-neurocomputing_2021.pdf.jpg
IM Thumbnail
image/jpeg
4933
https://riubu.ubu.es/bitstream/10259/6206/4/Juez-neurocomputing_2021.pdf.jpg
5ebbae51252e1549848c2ecdf473e2c9
MD5
4
LICENSE
license.txt
license.txt
text/plain; charset=utf-8
999
https://riubu.ubu.es/bitstream/10259/6206/3/license.txt
b295bcbce42e2caabeb0c623d3860c06
MD5
3
CC-LICENSE
license_rdf
license_rdf
application/rdf+xml; charset=utf-8
805
https://riubu.ubu.es/bitstream/10259/6206/2/license_rdf
4460e5956bc1d1639be9ae6146a50347
MD5
2
ORIGINAL
Juez-neurocomputing_2021.pdf
Juez-neurocomputing_2021.pdf
application/pdf
1068661
https://riubu.ubu.es/bitstream/10259/6206/1/Juez-neurocomputing_2021.pdf
3d2361e59c80bd769742340a4b593ead
MD5
1
10259/6206
oai:riubu.ubu.es:10259/6206
2022-11-21 13:51:46.402
Repositorio Institucional de la Universidad de Burgos
bubrep@ubu.es
RWwgYXV0b3IgY29tbyDDum5pY28gdGl0dWxhciBkZSBsb3MgZGVyZWNob3MgZGUgcHJvcGllZGFkIGludGVsZWN0dWFsIGRlIGxhIG9icmEsIG8gZGlzcG9uaWVuZG8gZGUgbG9zIGRlYmlkb3MgcGVybWlzb3MgZGUgbG9zIG90cm9zIHRpdHVsYXJlcywgc2kgbG9zIGh1YmllcmEsIHkgZW4gdmlydHVkIGRlIGxvcyBkZXJlY2hvcyBxdWUgbGUgY29uZmllcmUgbGEgbGVnaXNsYWNpw7NuIHZpZ2VudGUgc29icmUgcHJvcGllZGFkIGludGVsZWN0dWFsIHkgZGVyZWNob3MgZGUgYXV0b3IsIA0KQVVUT1JJWkEgYSBsYSBVbml2ZXJzaWRhZCBkZSBCdXJnb3MgYSBkaWZ1bmRpciwgZGUgbWFuZXJhIGdyYXR1aXRhLCBlbCBjb250ZW5pZG8gZGUgbG9zIGFyY2hpdm9zIGRpZ2l0YWxlcyBxdWUgY29ycmVzcG9uZGVuIGFsIGRvY3VtZW50byBkZXNjcml0byBhbnRlcmlvcm1lbnRlLCBjb24gY2Fyw6FjdGVyIG5vIGV4Y2x1c2l2byB5IGRlIG1hbmVyYSBww7pibGljYSBlbiBhY2Nlc28gYWJpZXJ0byBhIHRyYXbDqXMgZGUgSW50ZXJuZXQsIHBhcmEgbG8gcXVlIGxhIEJpYmxpb3RlY2EgcHJvY2VkZXLDoSBhIGFyY2hpdmFybG9zIGVuIGVsIFJlcG9zaXRvcmlvIEluc3RpdHVjaW9uYWwuIEFzaW1pc21vIGF1dG9yaXphIGEgbGEgVW5pdmVyc2lkYWQgZGUgQnVyZ29zIGEgcmVhbGl6YXIgbGFzIHRyYW5zZm9ybWFjaW9uZXMgbmVjZXNhcmlhcyBkZSBmb3JtYXRvLCBubyBkZSBjb250ZW5pZG8sIHBhcmEgZ2FyYW50aXphciBsYSBwcmVzZXJ2YWNpw7NuIHkgZWwgYWNjZXNvIGVuIGVsIGZ1dHVyby4NCg0KRWwgYXV0b3IgZGlzcG9uZSwgZW4gdG9kbyBjYXNvLCBkZWwgZGVyZWNobyBhIHJldm9jYXIgZXN0YSBhdXRvcml6YWNpw7NuLg0KDQpMYSBjZXNpw7NuIGRlIGRlcmVjaG9zIGRlIGVzdGEgb2JyYSBzZSBlbmN1ZW50cmEgc3VqZXRhIGEgbGEgbGVnaXNsYWNpw7NuIHZpZ2VudGUgc29icmUgcHJvcGllZGFkIGludGVsZWN0dWFsIHkgZGVyZWNob3MgZGUgYXV0b3Iu