One though i had yesterday, was to implement a neural network on GPU. The network will be constructed on host (using only supervised learning for now; not unsupervised) and then copied to device memory.
So, the main difficulty is to store the network in a two-dimensional array. In my experience, the usage of pointers in CUDA reduce the performance for many reasons (you have to construct all the pointers to device that is difficult and very slow, etc.).
Furthermore, by mapping the data structure to a table, I could bound to a texture
and gain significant performace from the texture fetches.
What do you think?