Strange problem with array, loop and if


I am having a strange problem with this kernel:

const unsigned int BLOCK_SIZE=64;

__global__ void determineCombinedSteering_kernel (

                   const unsigned int GLOBAL_ARRAY_SIZE,

                   const float3* const vehicle_forward,

                   float3* const result



   // the index to the data in the global data field

   const unsigned int index = BLOCK_SIZE * blockIdx.x + threadIdx.x;

  const float3 local_vehicle_position    = vehicle_forward[index];

  const int LOCAL_ARRAY_SIZE = 2;

   float3 neighbor_position[LOCAL_ARRAY_SIZE];

   neighbor_position[0] = make_float3(0.0f, 0.0f, 0.0f);

  int j=0;

   for (int i=0; i<GLOBAL_ARRAY_SIZE; ++i) {

      if ( j < LOCAL_ARRAY_SIZE ) {

         neighbor_position[j] = vehicle_forward[i];




  result[index] = neighbor_position[0];


The execution of this kernel fails with the error message “unspecified launch failure”. I have no idea why this is happening. :( As far as I have understood cuda this code should execute correctly.

If I change the loop to

for (int i=0; i<LOCAL_ARRAY_SIZE; ++i) {

  	neighbor_position[i] = vehicle_forward[i];


the kernel executes without a problem.

Can anyone tell me why the kernel crashes?



No one knows why the kernel crashes?

I have attached a complete program which uses the kernel posted above. Can someone please compile + execute it so I know if this is just a local problem or a generic problem with the current cuda version?

It could be that the device runs out of registers if you run it with “if” and 512 threads. Try to run it with less threads and more blocks (if you want the same amount of threads started in total) and see if that works.


Thank you for the suggestion. :) I just tried it with 1 thread per block, but it still fails with the same error message.