Universidad de Burgos RIUBU Principal Default Universidad de Burgos RIUBU Principal Default
  • español
  • English
  • français
  • Deutsch
  • português (Brasil)
  • italiano
Universidad de Burgos RIUBU Principal Default
  • Ayuda
  • Contact Us
  • Send Feedback
  • Acceso abierto
    • Archivar en RIUBU
    • Acuerdos editoriales para la publicación en acceso abierto
    • Controla tus derechos, facilita el acceso abierto
    • Sobre el acceso abierto y la UBU
    • español
    • English
    • français
    • Deutsch
    • português (Brasil)
    • italiano
    • español
    • English
    • français
    • Deutsch
    • português (Brasil)
    • italiano
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Browse

    All of RIUBUCommunities and CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister

    Statistics

    View Usage Statistics

    Compartir

    View Item 
    •   RIUBU Home
    • E-Prints
    • Untitled
    • Untitled
    • Artículos ADMIRABLE
    • View Item
    •   RIUBU Home
    • E-Prints
    • Untitled
    • Untitled
    • Artículos ADMIRABLE
    • View Item

    Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10259/4221

    Título
    Instance selection of linear complexity for big data
    Autor
    Arnaiz González, ÁlvarUBU authority Orcid
    Diez Pastor, José FranciscoUBU authority Orcid
    Rodríguez Diez, Juan JoséUBU authority Orcid
    García Osorio, CésarUBU authority Orcid
    Publicado en
    Knowledge-Based Systems. 2016. V. 107, p. 83–95
    Editorial
    Elsevier
    Fecha de publicación
    2016-09
    ISSN
    0950-7051
    DOI
    10.1016/j.knosys.2016.05.056
    Abstract
    Over recent decades, database sizes have grown considerably. Larger sizes present new challenges, because machine learning algorithms are not prepared to process such large volumes of information. Instance selection methods can alleviate this problem when the size of the data set is medium to large. However, even these methods face similar problems with very large-to-massive data sets. In this paper, two new algorithms with linear complexity for instance selection purposes are presented. Both algorithms use locality-sensitive hashing to find similarities between instances. While the complexity of conventional methods (usually quadratic, O(n2), or log-linear, O(nlogn)) means that they are unable to process large-sized data sets, the new proposal shows competitive results in terms of accuracy. Even more remarkably, it shortens execution time, as the proposal manages to reduce complexity and make it linear with respect to the data set size. The new proposal has been compared with some of the best known instance selection methods for testing and has also been evaluated on large data sets (up to a million instances).
    Palabras clave
    Nearest neighbor
    Data reduction
    Instance selection
    Hashing
    Big data
    Materia
    Informática
    Computer science
    URI
    http://hdl.handle.net/10259/4221
    Versión del editor
    http://dx.doi.org/10.1016/j.knosys.2016.05.056
    Collections
    • Untitled
    • Artículos ADMIRABLE
    Attribution 4.0 International
    Documento(s) sujeto(s) a una licencia Creative Commons Attribution 4.0 International
    Files in this item
    Nombre:
    Arnaiz-KBS_2016.pdf
    Tamaño:
    1.129Mb
    Formato:
    Adobe PDF
    Thumbnail
    FilesOpen

    Métricas

    Citas

    Academic Search
    Ver estadísticas de uso

    Export

    RISMendeleyRefworksZotero
    • edm
    • marc
    • xoai
    • qdc
    • ore
    • ese
    • dim
    • uketd_dc
    • oai_dc
    • etdms
    • rdf
    • mods
    • mets
    • didl
    • premis
    Show full item record