An Extensive Performance Comparison between Feature Reduction and Feature Selection Preprocessing Algorithms on Imbalanced Wide Data

Ramos Pérez, Ismael; Barbero Aparicio, José Antonio; Canepa Oneto, Antonio Jesús; Arnaiz González, Álvar; Maudes Raedo, Jesús M.

doi:10.3390/info15040223

dc.contributor.author	Ramos Pérez, Ismael
dc.contributor.author	Barbero Aparicio, José Antonio
dc.contributor.author	Canepa Oneto, Antonio Jesús
dc.contributor.author	Arnaiz González, Álvar
dc.contributor.author	Maudes Raedo, Jesús M.
dc.date.accessioned	2026-01-26T08:14:11Z
dc.date.available	2026-01-26T08:14:11Z
dc.date.issued	2024-04
dc.identifier.issn	2078-2489
dc.identifier.uri	https://hdl.handle.net/10259/11282
dc.description.abstract	The most common preprocessing techniques used to deal with datasets having high dimensionality and a low number of instances—or wide data—are feature reduction (FR), feature selection (FS), and resampling. This study explores the use of FR and resampling techniques, expanding the limited comparisons between FR and filter FS methods in the existing literature, especially in the context of wide data. We compare the optimal outcomes from a previous comprehensive study of FS against new experiments conducted using FR methods. Two specific challenges associated with the use of FR are outlined in detail: finding FR methods that are compatible with wide data and the need for a reduction estimator of nonlinear approaches to process out-of-sample data. The experimental study compares 17 techniques, including supervised, unsupervised, linear, and nonlinear approaches, using 7 resampling strategies and 5 classifiers. The results demonstrate which configurations are optimal, according to their performance and computation time. Moreover, the best configuration—namely, k Nearest Neighbor (KNN) + the Maximal Margin Criterion (MMC) feature reducer with no resampling—is shown to outperform state-of-the-art algorithms.	en
dc.description.sponsorship	This work was supported by the Junta de Castilla y León under project BU055P20 (JCyL/FEDER, UE) and by the Ministry of Science and Innovation under project PID2020-119894GB-I00, co-financed through European Union FEDER funds. Ismael Ramos-Pérez is funded through a pre-doctoral grant by the Universidad de Burgos.	en
dc.format.mimetype	application/pdf
dc.language.iso	eng	es
dc.publisher	MDPI	es
dc.relation.ispartof	Information. 2024, V. 15, n. 4, 223	es
dc.rights	Atribución 4.0 Internacional	*
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/	*
dc.subject	Feature selection	en
dc.subject	Feature reduction	en
dc.subject	Wide data	en
dc.subject	High dimensional data	en
dc.subject	Imbalanced data	en
dc.subject	Machine learning	en
dc.subject.other	Informática	es
dc.subject.other	Computer science	en
dc.subject.other	Inteligencia artificial	es
dc.subject.other	Artificial intelligence	en
dc.title	An Extensive Performance Comparison between Feature Reduction and Feature Selection Preprocessing Algorithms on Imbalanced Wide Data	en
dc.type	info:eu-repo/semantics/article	es
dc.rights.accessRights	info:eu-repo/semantics/openAccess	es
dc.relation.publisherversion	https://doi.org/10.3390/info15040223	es
dc.identifier.doi	10.3390/info15040223
dc.identifier.essn	2078-2489
dc.journal.title	Information	en
dc.volume.number	15	es
dc.issue.number	4	es
dc.page.initial	223	es
dc.type.hasVersion	info:eu-repo/semantics/publishedVersion	es

Fichier(s) constituant ce document

Nom:: Ramos-information_2024.pdf
Taille:: 540.8Ko
Format:: Adobe PDF

Voir/Ouvrir

Ce document figure dans la(les) collection(s) suivante(s)

Afficher la notice abrégée