Universidad de Burgos Repositorio Repositorio
Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10259/4221

Ver estadísticas de uso
Título : Instance selection of linear complexity for big data
Autor : Arnaiz González, Álvar
Diez Pastor, José Francisco
Rodríguez Diez, Juan José
García Osorio, César
Publicado en: Knowledge-Based Systems. 2016. V. 107, p. 83–95
Editorial : Elsevier
Fecha de publicación : sep-2016
ISSN : 0950-7051
DOI: 10.1016/j.knosys.2016.05.056
Resumen : Over recent decades, database sizes have grown considerably. Larger sizes present new challenges, because machine learning algorithms are not prepared to process such large volumes of information. Instance selection methods can alleviate this problem when the size of the data set is medium to large. However, even these methods face similar problems with very large-to-massive data sets. In this paper, two new algorithms with linear complexity for instance selection purposes are presented. Both algorithms use locality-sensitive hashing to find similarities between instances. While the complexity of conventional methods (usually quadratic, O(n2), or log-linear, O(nlogn)) means that they are unable to process large-sized data sets, the new proposal shows competitive results in terms of accuracy. Even more remarkably, it shortens execution time, as the proposal manages to reduce complexity and make it linear with respect to the data set size. The new proposal has been compared with some of the best known instance selection methods for testing and has also been evaluated on large data sets (up to a million instances).
Palabras clave: Nearest neighbor
Data reduction
Instance selection
Hashing
Big data
Materia: Informática
Computer science
Licencia: http://creativecommons.org/licenses/by/4.0/
URI : http://hdl.handle.net/10259/4221
Versión del editor: http://dx.doi.org/10.1016/j.knosys.2016.05.056
Aparece en las colecciones: Artículos ADMIRABLE

Ficheros en este ítem:

Fichero Descripción Tamaño Formato
Arnaiz-KBS_2016.pdf1,16 MBAdobe PDFVisualizar/Abrir

Este ítem está sujeto a una licencia Creative Commons Licencia Creative Commons
Creative Commons

Los ítems del Repositorio Institucional de la Universidad de Burgos están protegidos por copyright, con todos los derechos reservados, a menos que se indique lo contrario.

 

Valid XHTML 1.0! DSpace Software Copyright © 2002-2008 MIT and Hewlett-Packard - Sobre DSpace