Show simple item record

dc.contributor.authorArnaiz González, Álvar 
dc.contributor.authorDiez Pastor, José Francisco 
dc.contributor.authorRodríguez Diez, Juan José 
dc.contributor.authorGarcía Osorio, César 
dc.date.accessioned2016-09-01T09:42:59Z
dc.date.available2016-09-01T09:42:59Z
dc.date.issued2016-09
dc.identifier.issn0950-7051
dc.identifier.urihttp://hdl.handle.net/10259/4221
dc.description.abstractOver recent decades, database sizes have grown considerably. Larger sizes present new challenges, because machine learning algorithms are not prepared to process such large volumes of information. Instance selection methods can alleviate this problem when the size of the data set is medium to large. However, even these methods face similar problems with very large-to-massive data sets. In this paper, two new algorithms with linear complexity for instance selection purposes are presented. Both algorithms use locality-sensitive hashing to find similarities between instances. While the complexity of conventional methods (usually quadratic, O(n2), or log-linear, O(nlogn)) means that they are unable to process large-sized data sets, the new proposal shows competitive results in terms of accuracy. Even more remarkably, it shortens execution time, as the proposal manages to reduce complexity and make it linear with respect to the data set size. The new proposal has been compared with some of the best known instance selection methods for testing and has also been evaluated on large data sets (up to a million instances).en
dc.description.sponsorshipSupported by the Research Projects TIN 2011-24046 and TIN 2015-67534-P from the Spanish Ministry of Economy and Competitiveness.en
dc.format.mimetypeapplication/pdf
dc.language.isoenges
dc.publisherElsevieren
dc.relation.ispartofKnowledge-Based Systems. 2016. V. 107, p. 83–95en
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.subjectNearest neighboren
dc.subjectData reductionen
dc.subjectInstance selectionen
dc.subjectHashingen
dc.subjectBig dataen
dc.subject.otherInformáticaes
dc.subject.otherComputer scienceen
dc.titleInstance selection of linear complexity for big dataen
dc.typeArtículoes
dc.typeinfo:eu-repo/semantics/article
dc.rights.accessRightsinfo:eu-repo/semantics/openAccess
dc.relation.publisherversionhttp://dx.doi.org/10.1016/j.knosys.2016.05.056
dc.identifier.doi10.1016/j.knosys.2016.05.056
dc.relation.projectIDinfo:eu-repo/grantAgreement/MINECO/TIN 2011-24046
dc.relation.projectIDinfo:eu-repo/grantAgreement/MINECO/TIN 2015-67534-P
dc.type.hasVersioninfo:eu-repo/semantics/publishedVersionen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record