Evaluation of Population Initialization Methods for Genetic Programming-based Symbolic Regression
2026-06-30 • Neural and Evolutionary Computing
Neural and Evolutionary ComputingMachine Learning
AI summaryⓘ
The authors studied how different ways of starting a genetic programming process affect the quality and simplicity of the equations it finds for symbolic regression tasks. They tested three random starting methods and one using small optimized solutions from exhaustive searching. Their experiments on several problems showed that none of these starting methods led to better final results after a few generations. The initial benefit of using optimized starting solutions quickly disappeared. Overall, the authors concluded that as long as the starting solutions are diverse, the way you start doesn't really change the final outcome.
Genetic ProgrammingSymbolic RegressionInitial PopulationMulti-objective OptimizationNSGA-IIPareto FrontExhaustive Symbolic RegressionModel ComplexityOptimization InitializationEvolutionary Algorithms
Authors
Lukas Kammerer, Gabriel Kronberger, Deaglan J. Bartlett, Harry Desmond, Pedro G. Ferreira, Stephan Winkler
Abstract
We analyze the effect of optimizing the initial population of genetic programming (GP) for symbolic regression (SR) on the accuracy and complexity of solutions. We compare three well-established random initialization methods as well as initialization with small optimized solutions from exhaustive symbolic regression (ESR) using a GP/SR implementation which is based on the multi-objective evolutionary algorithm NSGA-II. We compare the final Pareto fronts found with each initialization method on twelve synthetic problems of varying complexity and one real-world dataset. We find no significant differences in accuracy or model complexity among the initialization methods. The initial advantage of initialization with ESR disappears after only a few generations. Our results show that, given similar diversity in the initial population, the effect of the initialization method in GP-based symbolic regression on the final Pareto front is negligible.