RT info:eu-repo/semantics/article T1 Variable selection for linear regression in large databases: exact methods A1 Pacheco Bonrostro, Joaquín A1 Casado Yusta, Silvia K1 Variable selection K1 Linear regression K1 Branch & Bound methods K1 Heuristics K1 Economía K1 Economics K1 Matemáticas K1 Mathematics AB This paper analyzes the variable selection problem in the context of Linear Regression for large databases. The problem consists of selecting a small subset of independent variables that can perform the prediction task optimally. This problem has a wide range of applications. One important type of application is the design of composite indicators in various areas (sociology and economics, for example). Other important applications of variable selection in linear regression can be found in fields such as chemometrics, genetics, and climate prediction, among many others. For this problem, we propose a Branch & Bound method. This is an exact method and therefore guarantees optimal solutions. We also provide strategies that enable this method to be applied in very large databases (with hundreds of thousands of cases) in a moderate computation time. A series of computational experiments shows that our method performs well compared to well-known methods in the literature and with commercial software. PB Springer SN 0924-669X YR 2020 FD 2020-11 LK http://hdl.handle.net/10259/8437 UL http://hdl.handle.net/10259/8437 LA eng NO This work was partially supported by FEDER funds and the Spanish Ministry of Economy and Competitiveness (Projects ECO2016-76567-C4-2-R and PID2019-104263RB-C44), the Regional Government of “Castilla y León”, Spain (Project BU329U14 and BU071G19), the Regional Government of “Castilla y León” and FEDER funds (Project BU062U16 and COV2000375). DS Repositorio Institucional de la Universidad de Burgos RD 12-may-2024