Only Part of the data is processed?

Hi, I am running into a problem where the cuda code only process part of the data. Exactly blocksize*gridsize is processed. So if I set threads per block as 256 and initialize 32 blocks per grid, only 0 to 8191 gets processed even my array is 50000 long…

What could have caused this?

You did not show the code that does the processing, so I can only guess. My best guess is that your code is set up to process exactly one array element per thread, and since you have 8192 threads total, only array elements 0 through 8191 get processed. If this indeed so, you could either increase the number of threads by using a grid with a sufficient number of thread blocks, or have each thread loop over multiple array elements.