1GB of ram / neuron/network simulations

To start using CUDA, I am considering to buy a 8600GT or 8800GT. I realized there are a few cards with 1GB of ram. I wonder if all of the ram is distributed evenly to all resources in the card, so it is possible to use all of 1GB of ram ?

I am quite experienced in programming however I have almost no experience in parallel computing. I have read some material but I want to be sure on some points. My aim is (pseudo-)realistic neuron and network simulations. So, I have many similar densely connected agents. Agents are modelled basically as some order ODE (although it may be a need to take space into account, so ODE may become PDE). So, what I wonder is if CUDA is appropriate for this. I assume I should have a thread for each neuron, but is it possible to access other data (near or maybe far) ? As I said, I am a newbie on this, maybe I am saying something sounding bad. Is there anyone here doing realistic neuron/network simulations ?

Thank you…


Look at this company, they are using CUDA to do realistic neuron simulations:


If you need a lot of memory, a Tesla C870 has 1.5 GB

I think CUDA is really suited to doing neural network simulations. Generally, parallism is good for CUDA, and neural networks have a very high degree of inherent parallellism.

And yes, almost all RAM can be used for any purpose. There is some overhead, but as long as you don’t run X on the card that’s never more then ±25Mb.

Yes I think same, highly parallel electrical machine for highly parallel biological machine :)

I have tried an 1GB version of the 8800GT (actually for neural networks too, but the artificial version). You can address all the memory as usual, but I got quite bad performance. I did only get half the memory bandwidth compared to the 512MB cards. Despite of questions about it to the vendor, manufacturer and in this forum, I still have no clue if this is something related to the 1GB design or not.

I can attest to CUDA’s usefulness in NN programming. Just as a practical thought. I too had initially thought of using a single thread to represent each node. It turns out that if you have a larger network requiring multiple interconnection, it is faster (from a memory access standpoint) to have each thread represent a distinct neural connection within the netwrok.

example: if each of 512x512 nodes is connected to it’s 24 closest neighbors (2 degrees of separation). then running blocks of 5x5 threads per node per generation works out pretty well (you can offset the different blocks in the kernel to avoid bank conflict, giving you around 8 nodes calculated per thread block).

I am running 2x tesla cards w/ data sets of 512x512x300x size of float4 => around 1.25GB loaded onto each card and it works well as long as you can do the entire network from start to finish using device code.

moulik, thank you very much. I really need some practical ideas for NN simulation. I was thinking to have a thread for each neuron. But then, because of limits on thread count, I started to think just to map computations into cuda’s architecture (not a structural mapping). I know it depends on neural architecture and level to simulate. It seems the best and only way to try and see. Are you doing this with artificial neurons or with biological -in some level- neurons ?

I use CUDA with Tesla to simulate neuron networks, but the fastest way to do it is to represet all network’s tuning and processing actions with matrixes (including activation and so on). So, all your work will lead to some matrix-matrix and matrix-vector operations, which is realized with cublass. Others your can develop independently.

i would take a look at bill langdons work at the university of essex in the uk

i saw him running around the self evolving machine conference in teh summer with a lot of ai/gp software running on the gpu in his mac and windows boxes.

good luck