I’ve been developing code for a neural network in C++ for the past while. Most of my problems are dealing with 750 - 1000 variables, and I’ve had a tough time getting local optimizers to find a global minimum.

I’d love to rewrite the code to use a genetic algorithm in CUDA to speed up the convergence. I have not yet implemented the code in C++ yet, as the idea to go GA and CUDA came at (almost) the same time.

Problem is, I’m totally stuck. I’ve been reading all sorts of suggestions about memory usage - keeping as much as possible within the shared memory, etc. That’s only 512 elements, and with a problem that may need a population matrix of approx [100, 25000], this clearly will not fit in shared memory. I’ve become totally confused now on how to approach the problem. Initially, I wanted to move EVERYTHING to the GPU. I’d get random numbers for the initial guess for the population matrix, and from that time on, keep it there.

Can someone please give me some direction on how to go about this code? I can explain the math, if necessary. I am just totally stuck now.

Thanks,

Michael