2024-03-29T00:13:57Zhttps://riubu.ubu.es/oai/requestoai:riubu.ubu.es:10259/62062022-11-21T12:51:46Zcom_10259_5377com_10259_5086com_10259_2604col_10259_5378
2021-11-23T08:25:06Z
urn:hdl:10259/6206
Approx-SMOTE: Fast SMOTE for Big Data on Apache Spark
Juez Gil, Mario
Arnaiz González, Álvar
Rodríguez Diez, Juan José
López Nozal, Carlos
García Osorio, César
SMOTE
Imbalance
Spark
Big data
Data mining
One of the main goals of Big Data research, is to find new data mining methods that are able to process large amounts of data in acceptable times. In Big Data classification, as in traditional classification, class imbalance is a common problem that must be addressed, in the case of Big Data also looking for a solution that can be applied in an acceptable execution time. In this paper we present Approx-SMOTE, a parallel implementation of the SMOTE algorithm for the Apache Spark framework. The key difference with the original SMOTE, besides parallelism, is that it uses an approximated version of k-Nearest Neighbor which makes it highly scalable. Although an implementation of SMOTE for Big Data already exists (SMOTE-BD), it uses an exact Nearest Neighbor search, which does not make it entirely scalable. Approx-SMOTE on the other hand is able to achieve up to 30 times faster run times without sacrificing the improved classification performance offered by the original SMOTE.
2021-11-23T08:25:06Z
2021-11-23T08:25:06Z
2021-11
info:eu-repo/semantics/article
0925-2312
http://hdl.handle.net/10259/6206
10.1016/j.neucom.2021.08.086
eng
Neurocomputing. 2021, V. 464, p. 432-437
https://doi.org/10.1016/j.neucom.2021.08.086
info:eu-repo/grantAgreement/Fundación Bancaria Caixa d'Estalvis i Pensions de Barcelona//LCF%2FPR%2FPR18%2F51130007
info:eu-repo/grantAgreement/Junta de Castilla y León//BU055P20//Métodos y Aplicaciones Industriales del Aprendizaje Semisupervisado
info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-119894GB-I00/ES/APRENDIZAJE AUTOMATICO CON DATOS ESCASAMENTE ETIQUETADOS PARA LA INDUSTRIA 4.0
http://creativecommons.org/licenses/by-nc-nd/4.0/
info:eu-repo/semantics/openAccess
Attribution-NonCommercial-NoDerivatives 4.0 Internacional
Elsevier