2024-03-29T01:11:42Zhttps://riubu.ubu.es/oai/requestoai:riubu.ubu.es:10259/42212021-11-10T09:38:16Zcom_10259_5377com_10259_5086com_10259_2604com_10259_4219col_10259_5378col_10259_4220
Arnaiz González, Álvar
39
500
0000-0001-6965-0237
Diez Pastor, José Francisco
156
500
Rodríguez Diez, Juan José
477
500
García Osorio, César
212
500
0000-0002-1206-1084
2016-09-01T09:42:59Z
2016-09-01T09:42:59Z
2016-09
0950-7051
http://hdl.handle.net/10259/4221
10.1016/j.knosys.2016.05.056
Over recent decades, database sizes have grown considerably. Larger sizes present new challenges, because machine learning algorithms are not prepared to process such large volumes of information. Instance selection methods can alleviate this problem when the size of the data set is medium to large. However, even these methods face similar problems with very large-to-massive data sets.
In this paper, two new algorithms with linear complexity for instance selection purposes are presented. Both algorithms use locality-sensitive hashing to find similarities between instances. While the complexity of conventional methods (usually quadratic, O(n2), or log-linear, O(nlogn)) means that they are unable to process large-sized data sets, the new proposal shows competitive results in terms of accuracy. Even more remarkably, it shortens execution time, as the proposal manages to reduce complexity and make it linear with respect to the data set size. The new proposal has been compared with some of the best known instance selection methods for testing and has also been evaluated on large data sets (up to a million instances).
Supported by the Research Projects TIN 2011-24046 and TIN 2015-67534-P from the Spanish Ministry of Economy and Competitiveness.
application/pdf
eng
Elsevier
Knowledge-Based Systems. 2016. V. 107, p. 83–95
http://dx.doi.org/10.1016/j.knosys.2016.05.056
info:eu-repo/grantAgreement/MINECO/TIN 2011-24046
info:eu-repo/grantAgreement/MINECO/TIN 2015-67534-P
Attribution 4.0 International
http://creativecommons.org/licenses/by/4.0/
info:eu-repo/semantics/openAccess
Nearest neighbor
Data reduction
Instance selection
Hashing
Big data
Informática
Computer science
Instance selection of linear complexity for big data
info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
CC-LICENSE
license_rdf
license_rdf
application/rdf+xml; charset=utf-8
908
https://riubu.ubu.es/bitstream/10259/4221/8/license_rdf
0175ea4a2d4caec4bbcc37e300941108
MD5
8
THUMBNAIL
Arnaiz-KBS_2016.pdf.jpg
Arnaiz-KBS_2016.pdf.jpg
IM Thumbnail
image/jpeg
3919
https://riubu.ubu.es/bitstream/10259/4221/7/Arnaiz-KBS_2016.pdf.jpg
e618d1f61323a2ed3cc538ca8ff0c262
MD5
7
ORIGINAL
Arnaiz-KBS_2016.pdf
Arnaiz-KBS_2016.pdf
application/pdf
1184745
https://riubu.ubu.es/bitstream/10259/4221/1/Arnaiz-KBS_2016.pdf
d4e42af8a5936dad7b8f6e543e96d24f
MD5
1
LICENSE
license.txt
license.txt
text/plain; charset=utf-8
1362
https://riubu.ubu.es/bitstream/10259/4221/5/license.txt
5d013bfa6e473ff0db22cd82a4d71a70
MD5
5
TEXT
Arnaiz-KBS_2016.pdf.txt
Arnaiz-KBS_2016.pdf.txt
Extracted text
text/plain
67394
https://riubu.ubu.es/bitstream/10259/4221/6/Arnaiz-KBS_2016.pdf.txt
6389f85d4eac645f34bbf67d56f6eb1a
MD5
6
10259/4221
oai:riubu.ubu.es:10259/4221
2021-11-10 10:38:16.955
Repositorio Institucional de la Universidad de Burgos
bubrep@ubu.es
RWwgYXV0b3IgY29tbyDDum5pY28gdGl0dWxhciBkZSBsb3MgZGVyZWNob3MgZGUgcHJvcGllZGFkIGludGVsZWN0dWFsIGRlIGxhIG9icmEsIG8gZGlzcG9uaWVuZG8gZGUgbG9zIGRlYmlkb3MgcGVybWlzb3MgZGUgbG9zIG90cm9zIHRpdHVsYXJlcywgc2kgbG9zIGh1YmllcmEsIHkgZW4gdmlydHVkIGRlIGxvcyBkZXJlY2hvcyBxdWUgbGUgY29uZmllcmUgbGEgbGVnaXNsYWNpw7NuIHZpZ2VudGUgc29icmUgcHJvcGllZGFkIGludGVsZWN0dWFsIHkgZGVyZWNob3MgZGUgYXV0b3IsIApBVVRPUklaQSBhIGxhIFVuaXZlcnNpZGFkIGRlIEJ1cmdvcyBhIGRpZnVuZGlyLCBkZSBtYW5lcmEgZ3JhdHVpdGEsIGVsIGNvbnRlbmlkbyBkZSBsb3MgYXJjaGl2b3MgZGlnaXRhbGVzIHF1ZSBjb3JyZXNwb25kZW4gYWwgZG9jdW1lbnRvIGRlc2NyaXRvIGFudGVyaW9ybWVudGUsIGNvbiBjYXLDoWN0ZXIgbm8gZXhjbHVzaXZvIHkgZGUgbWFuZXJhIHDDumJsaWNhIGVuIGFjY2VzbyBhYmllcnRvIGEgdHJhdsOpcyBkZSBJbnRlcm5ldCwgcGFyYSBsbyBxdWUgbGEgQmlibGlvdGVjYSBwcm9jZWRlcsOhIGEgYXJjaGl2YXJsb3MgZW4gZWwgUmVwb3NpdG9yaW8gSW5zdGl0dWNpb25hbC4gQXNpbWlzbW8gYXV0b3JpemEgYSBsYSBVbml2ZXJzaWRhZCBkZSBCdXJnb3MgYSByZWFsaXphciBsYXMgdHJhbnNmb3JtYWNpb25lcyBuZWNlc2FyaWFzIGRlIGZvcm1hdG8sIG5vIGRlIGNvbnRlbmlkbywgcGFyYSBnYXJhbnRpemFyIGxhIHByZXNlcnZhY2nDs24geSBlbCBhY2Nlc28gZW4gZWwgZnV0dXJvLgoKRWwgYXV0b3IgZGlzcG9uZSwgZW4gdG9kbyBjYXNvLCBkZWwgZGVyZWNobyBhIHJldm9jYXIgZXN0YSBhdXRvcml6YWNpw7NuLgoKTGEgY2VzacOzbiBkZSBkZXJlY2hvcyBkZSBlc3RhIG9icmEgc2UgZW5jdWVudHJhIHN1amV0YSBhIGxhIGxlZ2lzbGFjacOzbiB2aWdlbnRlIHNvYnJlIHByb3BpZWRhZCBpbnRlbGVjdHVhbCB5IGRlcmVjaG9zIGRlIGF1dG9yLiBTdSBkaWZ1c2nDs24gZW4gZWwgUmVwb3NpdG9yaW8gc2Vyw6EgYmFqbyBsYSBtb2RhbGlkYWQgZGUgbGljZW5jaWEgQ3JlYXRpdmUgQ29tbW9ucyBvIGVxdWl2YWxlbnRlOiByZWNvbm9jaW1pZW50byDigJMgdXNvIG5vIGNvbWVyY2lhbCDigJMgc2luIG9icmEgZGVyaXZhZGEsIHBvciBsYSBxdWUgc2UgcGVybWl0ZSBoYWNlciBjb3BpYSwgZGlzdHJpYnVpciB5IGNvbXVuaWNhciBww7pibGljYW1lbnRlIGxhIG9icmEgc2llbXByZSBxdWUgc2UgY2l0ZSBhbCBhdXRvciwgZWwgdXNvIHF1ZSBzZSBoYWdhIGRlIGVsbGEgc2VhIG5vIGNvbWVyY2lhbCB5IG5vIHNlIGNyZWVuIG9icmFzIGRlcml2YWRhcyBhIHBhcnRpciBkZSBsYSBvcmlnaW5hbC4K