2024-03-29T05:36:08Zhttps://riubu.ubu.es/oai/requestoai:riubu.ubu.es:10259/42212021-11-10T09:38:16Zcom_10259_5377com_10259_5086com_10259_2604com_10259_4219col_10259_5378col_10259_4220
Repositorio Institucional de la Universidad de Burgos
author
Arnaiz González, Álvar
author
Diez Pastor, José Francisco
author
Rodríguez Diez, Juan José
author
García Osorio, César
2016-09-01T09:42:59Z
2016-09-01T09:42:59Z
2016-09
0950-7051
http://hdl.handle.net/10259/4221
10.1016/j.knosys.2016.05.056
Over recent decades, database sizes have grown considerably. Larger sizes present new challenges, because machine learning algorithms are not prepared to process such large volumes of information. Instance selection methods can alleviate this problem when the size of the data set is medium to large. However, even these methods face similar problems with very large-to-massive data sets.
In this paper, two new algorithms with linear complexity for instance selection purposes are presented. Both algorithms use locality-sensitive hashing to find similarities between instances. While the complexity of conventional methods (usually quadratic, O(n2), or log-linear, O(nlogn)) means that they are unable to process large-sized data sets, the new proposal shows competitive results in terms of accuracy. Even more remarkably, it shortens execution time, as the proposal manages to reduce complexity and make it linear with respect to the data set size. The new proposal has been compared with some of the best known instance selection methods for testing and has also been evaluated on large data sets (up to a million instances).
eng
Attribution 4.0 International
Nearest neighbor
Data reduction
Instance selection
Hashing
Big data
Instance selection of linear complexity for big data
info:eu-repo/semantics/article
RWwgYXV0b3IgY29tbyDDum5pY28gdGl0dWxhciBkZSBsb3MgZGVyZWNob3MgZGUgcHJvcGllZGFkIGludGVsZWN0dWFsIGRlIGxhIG9icmEsIG8gZGlzcG9uaWVuZG8gZGUgbG9zIGRlYmlkb3MgcGVybWlzb3MgZGUgbG9zIG90cm9zIHRpdHVsYXJlcywgc2kgbG9zIGh1YmllcmEsIHkgZW4gdmlydHVkIGRlIGxvcyBkZXJlY2hvcyBxdWUgbGUgY29uZmllcmUgbGEgbGVnaXNsYWNpw7NuIHZpZ2VudGUgc29icmUgcHJvcGllZGFkIGludGVsZWN0dWFsIHkgZGVyZWNob3MgZGUgYXV0b3IsIApBVVRPUklaQSBhIGxhIFVuaXZlcnNpZGFkIGRlIEJ1cmdvcyBhIGRpZnVuZGlyLCBkZSBtYW5lcmEgZ3JhdHVpdGEsIGVsIGNvbnRlbmlkbyBkZSBsb3MgYXJjaGl2b3MgZGlnaXRhbGVzIHF1ZSBjb3JyZXNwb25kZW4gYWwgZG9jdW1lbnRvIGRlc2NyaXRvIGFudGVyaW9ybWVudGUsIGNvbiBjYXLDoWN0ZXIgbm8gZXhjbHVzaXZvIHkgZGUgbWFuZXJhIHDDumJsaWNhIGVuIGFjY2VzbyBhYmllcnRvIGEgdHJhdsOpcyBkZSBJbnRlcm5ldCwgcGFyYSBsbyBxdWUgbGEgQmlibGlvdGVjYSBwcm9jZWRlcsOhIGEgYXJjaGl2YXJsb3MgZW4gZWwgUmVwb3NpdG9yaW8gSW5zdGl0dWNpb25hbC4gQXNpbWlzbW8gYXV0b3JpemEgYSBsYSBVbml2ZXJzaWRhZCBkZSBCdXJnb3MgYSByZWFsaXphciBsYXMgdHJhbnNmb3JtYWNpb25lcyBuZWNlc2FyaWFzIGRlIGZvcm1hdG8sIG5vIGRlIGNvbnRlbmlkbywgcGFyYSBnYXJhbnRpemFyIGxhIHByZXNlcnZhY2nDs24geSBlbCBhY2Nlc28gZW4gZWwgZnV0dXJvLgoKRWwgYXV0b3IgZGlzcG9uZSwgZW4gdG9kbyBjYXNvLCBkZWwgZGVyZWNobyBhIHJldm9jYXIgZXN0YSBhdXRvcml6YWNpw7NuLgoKTGEgY2VzacOzbiBkZSBkZXJlY2hvcyBkZSBlc3RhIG9icmEgc2UgZW5jdWVudHJhIHN1amV0YSBhIGxhIGxlZ2lzbGFjacOzbiB2aWdlbnRlIHNvYnJlIHByb3BpZWRhZCBpbnRlbGVjdHVhbCB5IGRlcmVjaG9zIGRlIGF1dG9yLiBTdSBkaWZ1c2nDs24gZW4gZWwgUmVwb3NpdG9yaW8gc2Vyw6EgYmFqbyBsYSBtb2RhbGlkYWQgZGUgbGljZW5jaWEgQ3JlYXRpdmUgQ29tbW9ucyBvIGVxdWl2YWxlbnRlOiByZWNvbm9jaW1pZW50byDigJMgdXNvIG5vIGNvbWVyY2lhbCDigJMgc2luIG9icmEgZGVyaXZhZGEsIHBvciBsYSBxdWUgc2UgcGVybWl0ZSBoYWNlciBjb3BpYSwgZGlzdHJpYnVpciB5IGNvbXVuaWNhciBww7pibGljYW1lbnRlIGxhIG9icmEgc2llbXByZSBxdWUgc2UgY2l0ZSBhbCBhdXRvciwgZWwgdXNvIHF1ZSBzZSBoYWdhIGRlIGVsbGEgc2VhIG5vIGNvbWVyY2lhbCB5IG5vIHNlIGNyZWVuIG9icmFzIGRlcml2YWRhcyBhIHBhcnRpciBkZSBsYSBvcmlnaW5hbC4K
URL
https://riubu.ubu.es/bitstream/10259/4221/1/Arnaiz-KBS_2016.pdf
File
MD5
d4e42af8a5936dad7b8f6e543e96d24f
1184745
application/pdf
Arnaiz-KBS_2016.pdf
URL
https://riubu.ubu.es/bitstream/10259/4221/6/Arnaiz-KBS_2016.pdf.txt
File
MD5
6389f85d4eac645f34bbf67d56f6eb1a
67394
text/plain
Arnaiz-KBS_2016.pdf.txt