Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10259/8437
Título
Variable selection for linear regression in large databases: exact methods
Publicado en
Applied Intelligence. 2021, V. 51, n. 6, p. 3736–3756
Editorial
Springer
Fecha de publicación
2020-11
ISSN
0924-669X
DOI
10.1007/s10489-020-01927-6
Resumen
This paper analyzes the variable selection problem in the context of Linear Regression for large databases. The problem consists of selecting a small subset of independent variables that can perform the prediction task optimally. This problem has a wide range of applications. One important type of application is the design of composite indicators in various areas (sociology and economics, for example). Other important applications of variable selection in linear regression can be found in fields such as chemometrics, genetics, and climate prediction, among many others. For this problem, we propose a Branch & Bound method. This is an exact method and therefore guarantees optimal solutions. We also provide strategies that enable this method to be applied in very large databases (with hundreds of thousands of cases) in a moderate computation time. A series of computational experiments shows that our method performs well compared to well-known methods in the literature and with commercial software.
Palabras clave
Variable selection
Linear regression
Branch & Bound methods
Heuristics
Materia
Economía
Economics
Matemáticas
Mathematics
Versión del editor
Aparece en las colecciones