Question about resume kernel execution

Hi all,

I have a program like this:

  1. executing kernel1, when finished a certain quantity of work, stop it, and save the context related info (include all necessary info for resume the kernel) from shared memory to global memory (which is allocated before in host function).

  2. run kernel2 for several times.

  3. run kernel1 again, in global function of kernel1, copy the context related info from global memory to shared memory again to restore the running environment.

My problem is:

all context related info could be copied and used correctly when kernel1 is resumed, however, the pointer (point to a address of global memory which allocated for kernel1 in host function, this buffer of global memory is allocate at the beginning of the program, and only be destroyed at the exact end of program, during running time of kernel1 and kernel2, this buffer exist normally, and kernel2 will not use this buffer.) could not be used.

I compared the pointer values of the two times of running of kernel1, they looks same. I compared the buffer the pointer points to, they are unchanged by kernel2.

Does anybody know what the reason is?

Thanks in advance~

Susan

Is all of this happening inside one host thread, or is the host code multithreaded?

Is all of this happening inside one host thread, or is the host code multithreaded?

Thanks for your reply.

All happen in one host thread. Host is not multithread program. If I comment kernel2, no such error occurs, and the second execution of the kernel1 is correct.

I find the error by using cuda-gdb, when the pointer is used in the second call of the kernel1, the program just terminate, and the error msg is like following:

Program received signal CUDA_EXCEPTION_5, Warp Out-of-range Address.

[Switching to CUDA Kernel 21 (<<<(0,0),(0,0,0)>>>)]

0x0000000001725338 in fill_rn_buff <<<(4,1),(4,1,1)>>> (gen=0x5320000, d_shared=0x5420000, count_per_thread=192, rn_buff=0x5220000, flag=warning: Variable is not live at this point. Returning garbage value.

  1. at lfg_kernel.cu:426

Thanks!

Thanks for your reply.

All happen in one host thread. Host is not multithread program. If I comment kernel2, no such error occurs, and the second execution of the kernel1 is correct.

I find the error by using cuda-gdb, when the pointer is used in the second call of the kernel1, the program just terminate, and the error msg is like following:

Program received signal CUDA_EXCEPTION_5, Warp Out-of-range Address.

[Switching to CUDA Kernel 21 (<<<(0,0),(0,0,0)>>>)]

0x0000000001725338 in fill_rn_buff <<<(4,1),(4,1,1)>>> (gen=0x5320000, d_shared=0x5420000, count_per_thread=192, rn_buff=0x5220000, flag=warning: Variable is not live at this point. Returning garbage value.

  1. at lfg_kernel.cu:426

Thanks!