Experimental evaluation of ensemble classifiers for imbalance in Big Data

Juez Gil, Mario; Arnaiz González, Álvar; Rodríguez Diez, Juan José; García Osorio, César

doi:10.1016/j.asoc.2021.107447

dc.contributor.author	Juez Gil, Mario
dc.contributor.author	Arnaiz González, Álvar
dc.contributor.author	Rodríguez Diez, Juan José
dc.contributor.author	García Osorio, César
dc.date.accessioned	2021-05-14T11:34:01Z
dc.date.available	2021-05-14T11:34:01Z
dc.date.issued	2021-09
dc.identifier.issn	1568-4946
dc.identifier.uri	http://hdl.handle.net/10259/5766
dc.description.abstract	Datasets are growing in size and complexity at a pace never seen before, forming ever larger datasets known as Big Data. A common problem for classification, especially in Big Data, is that the numerous examples of the different classes might not be balanced. Some decades ago, imbalanced classification was therefore introduced, to correct the tendency of classifiers that show bias in favor of the majority class and that ignore the minority one. To date, although the number of imbalanced classification methods have increased, they continue to focus on normal-sized datasets and not on the new reality of Big Data. In this paper, in-depth experimentation with ensemble classifiers is conducted in the context of imbalanced Big Data classification, using two popular ensemble families (Bagging and Boosting) and different resampling methods. All the experimentation was launched in Spark clusters, comparing ensemble performance and execution times with statistical test results, including the newest ones based on the Bayesian approach. One very interesting conclusion from the study was that simpler methods applied to unbalanced datasets in the context of Big Data provided better results than complex methods. The additional complexity of some of the sophisticated methods, which appear necessary to process and to reduce imbalance in normal-sized datasets were not effective for imbalanced Big Data.	en
dc.description.sponsorship	“la Caixa” Foundation, Spain, under agreement LCF/PR/PR18/51130007. This work was supported by the Junta de Castilla y León, Spain under project BU055P20 (JCyL/FEDER, UE) co-financed through European Union FEDER funds, and by the Consejería de Educación of the Junta de Castilla y León and the European Social Fund, Spain through a pre-doctoral grant (EDU/1100/2017).	es
dc.format.mimetype	application/pdf
dc.language.iso	eng	es
dc.publisher	Elsevier	es
dc.relation.ispartof	Applied Soft Computing. 2021, V. 108, 107447	es
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 Internacional	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	*
dc.subject	Unbalance	en
dc.subject	Imbalance	en
dc.subject	Ensemble	en
dc.subject	Resampling	en
dc.subject	Big Data	en
dc.subject	Spark	en
dc.subject.other	Informática	es
dc.subject.other	Computer science	en
dc.title	Experimental evaluation of ensemble classifiers for imbalance in Big Data	en
dc.type	info:eu-repo/semantics/article	es
dc.rights.accessRights	info:eu-repo/semantics/openAccess	es
dc.relation.publisherversion	https://doi.org/10.1016/j.asoc.2021.107447	es
dc.identifier.doi	10.1016/j.asoc.2021.107447
dc.relation.projectID	info:eu-repo/grantAgreement/Fundación Bancaria Caixa d'Estalvis i Pensions de Barcelona//LCF%2FPR%2FPR18%2F51130007	es
dc.relation.projectID	info:eu-repo/grantAgreement/Junta de Castilla y León//BU055P20//Métodos y Aplicaciones Industriales del Aprendizaje Semisupervisado	es
dc.journal.title	Applied Soft Computing	es
dc.volume.number	108	es
dc.page.initial	107447	es
dc.type.hasVersion	info:eu-repo/semantics/publishedVersion	es

Files in questo item

Nombre:: juez-asc_2021.pdf
Dimensione:: 753.5Kb
Formato:: Adobe PDF

Mostra/Apri

Questo item appare nelle seguenti collezioni

Artículos ADMIRABLE

Mostra i principali dati dell'item