hello,
i have written a c-programm(depth-first search) for solving a puzzle. i used the cuda-profiler to test the program and in the profiler i show that only the first cudaMalloc are executed and nothing more(no kernel …). in real, he executes the cudaMalloc and
the first kernel-call at point 6(pseudo-code below), and after try? to execute the 2,3,4 kernel-call at point 9 the profiler show
no kernel executing info. if i remove the kernel-calls 2,3,4 then the profiler shows the execution of kernel call 1.
the pseudo-code of the problematic code:
- pointer1=cudaMalloc(500kb global gpu ram)
- pointer2=cudaMalloc(4 byte global gpu ram for an unsigned int for parameter transfer to and from the gpu-mem)
- jump the a subfunction
- unsigned int a=1
- cudaMemcyp(pointer2,&a,4,HostToDevice)
- kernel call bla<<<16,9>>> (x,y,pointer1,pointer2) //test something, if a thread detectes a error, he sets *pointer2=0
7 cudaMemcyp(&a,pointer2,4,DeviceToHost)
8 only continue, if a=1, else return - 3 kernel calls like that (he comes to this position)
bla2<<<16,350>>> (x,y,pointer1,start)
recalculate new “start”
bla2<<<16,350>>> (x,y,pointer1,start)
recalculate new “start”
bla2<<<16,350>>> (x,y,pointer1,start)
no kernel of the 3 in point 9 are executed. if i write on of the 3 calls separate on the first line in the subfunction and return after it,
it is called !!!
this works:
subfunction start:
bla2<<<16,350>>> (x,y,pointer1,start)
return 0
it is true, that the host program, after calling a kernel waits until ALL threads inside are finished? if not, how can i archive this?
it is importend, because the code should only continue, if the test on point 6(pseudo-code) are passed. and the test-result are
only finished, if all threads are finised.
how can i calculate the memory-using of a kernel? is the kernel-code stored in global gpu-mem and the variable-mem
stored in the shared-mem of the thread-processor the kernel runs?
it is nesessary the capsulate every cudaMalloc, cudaMemcpy in this CUDA_SAFE_CALL -macro?
and why cudaThreadSynchronize?
my system:
athlon 64 X2 with 2gb ram
os: fedora8 64bit
cuda driver 177.73(64bit), cuda sdk 2.0, profiler 1.0
gpu: gforce 9800gtx, 512mb
sorry, i´m new to gpu-coding and the tutorials are not always answering all questions.
:unsure: