CUDA Device not ready ERROR

Hi Everyone,

I got a test program from net.

(given below)

void dummy_call()


    cudaEvent_t event;

    cudaError_t err;

err = cudaSetDevice(0);

    assert(cudaSuccess == err);

err = cudaEventCreate(&event);

    assert(cudaSuccess == err);

/* Query an event that hasn't been recorded */

    err = cudaEventQuery(event);

    printf("Query unrecorded event: \t\t%s\n", cudaGetErrorString(err));

/* Record the event */

    err = cudaEventRecord(event, 0);

    assert(cudaSuccess == err);

/* Query the event again, we now expect cudaErrorNotReady */

    err = cudaEventQuery(event);

    printf("Query recorded but not occured event: \t%s\n", cudaGetErrorString(err));

// Disparity map computation.

    dim3 num_threads(1, 1, 1); 

    dim3 num_blocks(1, 1, 1); 

    simple_kernel_call <<<num_blocks, num_threads>>> ();


/* Query the event again, we now expect cudaSuccess */

    err = cudaEventQuery(event);

    printf("Query recorded and occured event: \t%s\n", cudaGetErrorString(err));


And when I’m running it, I’m getting the following output :

I’ve tested with a sample test and it is able to detect the device. So some one please suggest what could be the possible way out ???

Many thanks for the help.

Bhanu Kiran Challa

My System Specifications

OS : Open SUSE 11.4 (x86_64)

Graphics card : NVIDIA GeForce GTX 465

Driver : 275.09.07

CUDA Toolkit(s) installed :


I too had a similar problem as yours.

the solution is

should come after

The reason for this is given in this thread

Hi Veda,

I’ve inserted the code you suggested and have run the program once again. There is no improvement and I’m getting the same error !

But thanks for the tip. I’ve learned something new.

Bhanu Kiran Challa

Exactly on which line, are you getting this error.

can you post the kernel code simple_kernel_call <<<num_blocks, num_threads>>> ();


I’m getting error at this line :

printf("Query recorded and occured event: \t%s\n", cudaGetErrorString(err));

Hope this helps.

Bhanu Kiran Challa

Hi Veda,

I’m launching a Kernel with 2D Grid with blocks (each 3D).

And the following code tries to insert into the array the threadId that accesses it.

Here is the code …

int idx = (blockIdx.x * blockDim.x) + threadIdx.x;

    int idy = (blockIdx.y * blockDim.y) + threadIdx.y;

    int idz = (blockIdx.z * blockDim.z) + threadIdx.z;

long index = (idx * block_height * block_depth)  + ( idy * block_depth ) + idz ;

    thread_index[index] = index;


Many thanks for your time.

Bhanu Kiran Challa