Genetic Programming on GPU enabled HPC cluster

I am starting to develop a symbolic regression tool for an HPC system (2IBM Power9 and 4NVIDIA v100 x node). I have to tackle a non standard problem that is very difficult to converge and for which we will have gigabytes of data. I’ve already studied some freely available symbolic regression software (both CPU and GPU) and the related publications but I still have some doubts about which is the better architecture for this kind of software. I wonder if it is out there someone expert on this kind of software/hardware available to discuss about the possible software architectures?

I admit I had to look up symbolic regression in Wikipedia. How long has this been around? I am curious because I had never heard about this.

If this is a fairly new sub-field of genetic programming, you might want to try implementing a few promising looking variants from the literature, write up the results, and become the expert :-)

I am not sure if it is new, and I don’know how long it is around. Basically it is a sub-field of genetic programming where you evolve formulas instead of entire programs. In theory it is not very difficult, there are some different ways to encode the informations needed to evolve formulas, and once you have decided the way to encode the information (how to manage the genes), you need to decide how much work you want to delegate to the GPU. Basically there are three ways to use the GPU. 1) evolution done on the CPU, NVRTC to build kernels on the fly and fitness evaluation on the GPU with some “pipeline” strategy to overlap kernel building and fitness evaluation; 2) evolution done on the CPU, Fitness evaluated on the GPU using a single kernel which is able to evaluate directly the gene encoding (MEP for example); 3) everything on the GPU, (at the moment I discarded this solution because the GPU code became too complex and I still need some CPU code to handle MPI and multi GPU stuff).