large thread question

Hi, I have a question about using threads with large memory requirements. I have about 5 million independent processes that will each require 2MB of memory. This is much too much for local memory, so I would have to use global memory right? But then it seems like that is too much for global memory as well. How can I split up execution so that only a certain number of threads would be running at any given time, and I will have enough memory to run them? Thanks!

Does it make any sense in this context to use a for loop that runs parts of the calculation? E.g. if you have enough memory for 500 blocks, can you do something like this:

for (int i = 0; i < 5000000; i += threadCountInEachBlock)

{

runGPUStuff<<<500, 512>>>(dataIndex[i]);

}

? Or is your problem somehow different?

Sorry, I am a bit new, so I may be asking very simple questions. I think that could be right, but how can I tell how many blocks I can run concurrently? Does the block/thread distinction come into play if I am only using global memory?