I have an “puzzle” that I need implemented on parallel processors, but it needs to be done with cuda to work with the rest of the program. After thinking about it for a while I decided that it would be best for me to talk to the community for your opinions on how to do it. I will try to write it in brain-dead english because it is hard to understand otherwise.
I have a kernel that has 64 blocks and 32 threads per block. Each thread has a register variable ‘int myInt;’ followed by another piece of data ‘int data;’. myInt is a random number between 0 and 50. I need the data portion in linear memory (global space) according to myInt (but only if myInt is not 0). If SOME of the data is lost that is ok but i’d like to keep as much of it as possible.
so optimally this is what will happen (where myInt is between 1 and 9):
each pair of integers is in register memory in a thread: myInt data 8 123 4 346 6 2 8 7445 7 23421 3 6516 8 3262 6 23445 3 93 would look like this in global memory (order of data doesn't matter): globalPointer: 6516, 93 globalPointer: 346 globalPointer: 2, 23445 globalPointer: 23421 globalPointer: 7445, 123, 3262