blocks bigger than 512 threads I can't see the error


I have a simple program where you have an array with the next values: a[0]=0; a[1]=1; a[2]=2;…a[n]=n;

The aim of the kernel is plus by one each position: a[0]=1; a[1]=2;…a[n]=n+1;

I’ve used one block and the variable “size” contains the numbers of threads which is the size of the array too.

So if the size is less than 512 all is correct. But for example the size is 560 all is wrong because the result is the same of the first array (no change).

I know that maximum sizes of dimension of a thread block is 512.

I want to know how I can see the error! I put this but it’s not a solution: CUT_CHECK_ERROR(“Kernel execution failed”);


global void PLUS (float* C)


int i = threadIdx.x;



int main(int argc, char** argv)


if (argc!=2){

    printf("wrong number of arguments!!!!!!\n");


unsigned int size =512;  //if size is bigger than 512 the array isn't change.

unsigned int mem_size = sizeof(float) * size;

float* h_C = (float*) malloc(mem_size);

for (int i = 0; i < size; i++)


    h_C[i] = (float)i;


float* d_C;

CUDA_SAFE_CALL(cudaMalloc((void**) &d_C, mem_size));

CUDA_SAFE_CALL(cudaMemcpy(d_C, h_C, mem_size, cudaMemcpyHostToDevice) );

PLUS <<< 1 , size >>> (d_C);

CUT_CHECK_ERROR(“Kernel execution failed”);

CUDA_SAFE_CALL(cudaMemcpy(h_C,d_C, mem_size, cudaMemcpyDeviceToHost) );

CUT_CHECK_ERROR(“Memcpy execution failed”);






What is the command for watch the error???

Thank you,

CUT_CHECK_ERROR is a no-op in release builds. Just look up its definition int the header file and you will see.

Call cudaThreadSychronize() followed by cudaGetLastError() to get any error code from a kernel launch. There is a function for converting the error code to a human readable string, too. Just look it up in the reference manual.

thank you MisterAnderson42 but there isn’t any error code. Simply looks like the program doesn’t read the kernel.

Why doesn’t exist error??