__global__ void get_string( char *L, char *buff, int *buf_index, int b_size)
{
// "L", "buff" and "buf_index" all have a size of "b_size"
int tid = threadIdx.x + (blockIdx.x * blockDim.x) ;
if( tid < b_size ) //line 01
{ //line 02
if ( buf_index[tid] > 0 ) //line 03
L[tid] = buff[ buf_index[tid]-1 ]; //line 04
else //line 05
L[tid] = buff[ b_size-1 ]; //line 06
} //line 07
// if (tid < b_size) //line 08
// L[tid]=buff[tid]; //line 09
}
int main()
{
// code to setup device pointers
// code to transfer "dev_L, dev_buffer, dev_bufferindex, buffer_size" to device using cudaMemcpyHostToDevice
get_string<<< 8, 128 >>>( dev_L, dev_buffer, dev_bufferindex, buffer_size );
// 1024 threads for the arrays "L", "buff" and "buf_index" each having a size of 1024
// code to get "L" from device using cudaMemcpyDeviceToHost
}
Now if the size of “L”, “buff” and “buf_index” is small like a value less than 10,
the lines labeled “line 01/02/03/04/05/06/07” works, and the values stored in the array “L” are correct.
But if the size of “L”, “buff” and “buf_index” is large like 1024.
the lines labeled “line 01/02/03/04/05/06/07” does NOT works. The program does work, but every time I run the
compiled program using the same values stored in “L”, “buff” and “buf_index”, I get different values stored in the array “L”. And if I remove “line 01/02/03/04/05/06/07”, and uncomment “line 08/09”, and recompile the program, the values that are now stored in the array “L” are correct, even though the values stored in this version of “L” is not what I want to be stored in “L”.
What I want, is for my program to work using “line 01/02/03/04/05/06/07” with a size like 1024 or greater.
So could anyone help me out with this?
I am running:
WindowsXP SP2 [32-bit]
Geforce GTX280
CUDA Toolkit 3.2
nvidia driver 260.99
Visual Studio 2008