Runtime API

Hi Friends…
My name is sailesh
I am new to cuda…
I need some help…

When I compile my code it is fine and generating .out file, when I am executing it
it is showing

cudaSafeCall() Runtime API error…

I dont know why I am getting this kind of message…

Please help me guys…

Sailesh

I assume you mean CUDA_SAFE_CALL from cutil.h? tmurray will be along shortly with a baseball bat…

Anyway, that means that something failed. Can you get any of the SDK examples to run (both in release and debug modes)? Do you get any more error messages?

Thanks for responding…

I can run all SDK examples, no error message there…

I get this error only when I run my code.

I am giving part of my code below, where it is failing

size_var = 4Nsizeof(float);

cutilSafeCall(cudaMalloc((void**)&gpu_x_k_mat,size_var));

cutilSafeCall(cudaMemcpy(gpu_x_k_mat,x_k_mat,size_var,cudaMe

mcpyHostToDevice));

size_var = binsbinsbins*sizeof(float);

cutilSafeCall(cudaMalloc((void**)&gpu_model_mat,size_var));

cutilSafeCall(cudaMemcpy(gpu_model_mat,model_mat,size_var,cu

daMemcpyHostToDevice));

size_var = 2*sizeof(float);

cutilSafeCall(cudaMalloc((void**)&gpu_scale_for_kernel,size_var));

cutilSafeCall(cudaMemcpy(gpu_scale_for_kernel,scale_for_kern

el,size_var,cudaMemcpyHostToDevice));

size_var = N*sizeof(float);

cutilSafeCall(cudaMalloc((void**)&gpu_w_k,size_var));

size_var = sizeof(int);

cutilSafeCall(cudaMalloc((void**)&gpu_bin,size_var));

cutilSafeCall(cudaMemcpy(gpu_bin,bin,size_var,cudaMemcpyHost

ToDevice));

size_var = sizeof(int);

cutilSafeCall(cudaMalloc((void**)&gpu_Na,size_var));

cutilSafeCall(cudaMemcpy(gpu_Na,Na,size_var,cudaMemcpyHostTo

Device));

pftrack_kernel<<<1, N>>>(gpu_x_k_mat,gpu_image_matR,gpu_image_matG,gpu_image_matB

,gpu_model_mat,gpu_scale_for_kernel,bin,Na,gpu_w_k);

size_var = N*sizeof(float);

[u][b]cutilSafeCall(cudaMemcpy(w_k,gpu_w_k,size_var,cudaMemcpyDevi

ceToHost));[/b][/u]

I get error only on the last line…(bold one)

What could be the reason…?

Your kernel probably has a segfault in it, and cudaThreadSynchronize would return unspecified launch failure if you were calling it. Since you’re not, cudaMemcpy is doing that instead.

Now, get rid of cutil and check your errors yourself with cudaGetLastError() and cudaGetErrorString().

I am gettint “unspecified launch failure” error…

Now I removed cutilSafeCall and used CUDA_SAFE_CALL

now I am getting the array that I have copied back from GPU as “NaN”

Does it mean SEG fault…?

[quote name=‘tmurray’ post=‘543437’ date=‘May 20 2009, 10:51 PM’]

Your kernel probably has a segfault in it, and cudaThreadSynchronize would return unspecified launch failure if you were calling it. Since you’re not, cudaMemcpy is doing that instead.

Thanks for your reply…

Again waiting for reply… :D

As I said in another thread, recompile in device emulation mode (emu=1 dbg=1, if you’re using common.mk), and run through valgrind.

Here I have this kernel invocation in a for loop

When compiled it in deviceemu mode it runs correctly up to 84 iterations. but exactly after 84 th iteration again I am getting “NaN” error.

and when I compile in GPU mode I am not at all able perform one iteration also.

cant we put the kernel invocation in a for loop…?

does it gives errors related to memory…?

What did valgrind say? Running kernels in loops shouldn’t be a problem.