Universidad de Burgos RIUBU Principal Default Universidad de Burgos RIUBU Principal Default
  • español
  • English
  • français
  • Deutsch
  • português (Brasil)
  • italiano
Universidad de Burgos RIUBU Principal Default
  • Ayuda
  • Contattaci
  • Manda Feedback
  • Acceso abierto
    • Archivar en RIUBU
    • Acuerdos editoriales para la publicación en acceso abierto
    • Controla tus derechos, facilita el acceso abierto
    • Sobre el acceso abierto y la UBU
    • español
    • English
    • français
    • Deutsch
    • português (Brasil)
    • italiano
    • español
    • English
    • français
    • Deutsch
    • português (Brasil)
    • italiano
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Ricerca

    Tutto RIUBUArchivi & CollezioniData di pubblicazioneAutoriTitoliSoggettiQuesta CollezioneData di pubblicazioneAutoriTitoliSoggetti

    My Account

    LoginRegistrazione

    Statistiche

    Ver Estadísticas de uso

    Compartir

    Mostra Item 
    •   RIUBU Home
    • E-Prints
    • Untitled
    • Untitled
    • Artículos ADMIRABLE
    • Mostra Item
    •   RIUBU Home
    • E-Prints
    • Untitled
    • Untitled
    • Artículos ADMIRABLE
    • Mostra Item

    Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10259/4221

    Título
    Instance selection of linear complexity for big data
    Autor
    Arnaiz González, ÁlvarAutoridad UBU Orcid
    Diez Pastor, José FranciscoAutoridad UBU Orcid
    Rodríguez Diez, Juan JoséAutoridad UBU Orcid
    García Osorio, CésarAutoridad UBU Orcid
    Publicado en
    Knowledge-Based Systems. 2016. V. 107, p. 83–95
    Editorial
    Elsevier
    Fecha de publicación
    2016-09
    ISSN
    0950-7051
    DOI
    10.1016/j.knosys.2016.05.056
    Abstract
    Over recent decades, database sizes have grown considerably. Larger sizes present new challenges, because machine learning algorithms are not prepared to process such large volumes of information. Instance selection methods can alleviate this problem when the size of the data set is medium to large. However, even these methods face similar problems with very large-to-massive data sets. In this paper, two new algorithms with linear complexity for instance selection purposes are presented. Both algorithms use locality-sensitive hashing to find similarities between instances. While the complexity of conventional methods (usually quadratic, O(n2), or log-linear, O(nlogn)) means that they are unable to process large-sized data sets, the new proposal shows competitive results in terms of accuracy. Even more remarkably, it shortens execution time, as the proposal manages to reduce complexity and make it linear with respect to the data set size. The new proposal has been compared with some of the best known instance selection methods for testing and has also been evaluated on large data sets (up to a million instances).
    Palabras clave
    Nearest neighbor
    Data reduction
    Instance selection
    Hashing
    Big data
    Materia
    Informática
    Computer science
    URI
    http://hdl.handle.net/10259/4221
    Versión del editor
    http://dx.doi.org/10.1016/j.knosys.2016.05.056
    Aparece en las colecciones
    • Untitled
    • Artículos ADMIRABLE
    Attribution 4.0 International
    Documento(s) sujeto(s) a una licencia Creative Commons Attribution 4.0 International
    Files in questo item
    Nombre:
    Arnaiz-KBS_2016.pdf
    Tamaño:
    1.129Mb
    Formato:
    Adobe PDF
    Thumbnail
    Mostra/Apri

    Métricas

    Citas

    Ver estadísticas de uso

    Exportar

    RISMendeleyRefworksZotero
    • edm
    • marc
    • xoai
    • qdc
    • ore
    • ese
    • dim
    • uketd_dc
    • oai_dc
    • etdms
    • rdf
    • mods
    • mets
    • didl
    • premis
    Mostra tutti i dati dell'item

    Universidad de Burgos

    Powered by MIT's. DSpace software, Version 5.10