Question about the order of call function in GPU

dinosaur0423 · June 24, 2019, 11:32am

Hello,
I am working on GTX1070ti and recently I have a problem. For example in a do-loop I have “A”, “B” two global void function. Function A will collect some information and these information will be used and finally cleaned in B. The launched thread number of A,B are equal, but thread number may vary in each loop(0 to 30000 data are possible).
My question is: Same loops have only one data to be proceeded. I launch function A and B(both are 1 thread) in sequence. Will GPU execute also function A,B in sequence? or I have to use cudaDeviceSynchronize() to enforce the sequence?
And I am wondering, if I introduce cudaStream_t and let function A and B are the same Stream. Is it sufficient to ensure that GPU will execute function A, B in sequence?i
Thanks!

Robert_Crovella · June 24, 2019, 12:03pm

kernels launched into the default stream will serialize. There is no need to put a cudaDeviceSynchronize() in between.

[url]Programming Guide :: CUDA Toolkit Documentation

dinosaur0423 · June 25, 2019, 8:15am

Hello,
Thank you for the reply. This is really helpful.
Yesterday I added a cudaDeviceSynchronize(), then the program was fine and data was unchanged.
Before doing this, I have used cuda-memcheck to check memory and no out of memory happened. The data were also checked and correct. So I am doubting, is this error due to my code, because my program is composed by several Obj files and A and B are in the different Obj file.
Thanks again!

Topic		Replies	Views
My first test on CUDA and some questions sync, thread with CUDA CUDA Programming and Performance	5	3041	November 13, 2007
Some problem about Synchronize CPU and GPU CUDA Programming and Performance	0	549	March 25, 2017
Cuda context and cudaDeviceSynchronize CUDA Programming and Performance	1	721	February 27, 2023
Using GPU and CPU at the same time CUDA Programming and Performance	5	6965	March 4, 2009
Interactions among blocks CUDA Programming and Performance	11	11476	February 6, 2010
Concurrent Kernel executions Concurrent Kernel executions on same CPU thread and multiple CPU threa CUDA Programming and Performance	2	4181	August 25, 2011
Threads status before launch of new kernel function CUDA Programming and Performance	2	2693	December 25, 2010
Do i really need to use cudaDeviceSynchronize in this scenario ? CUDA Programming and Performance	2	1028	February 11, 2019
cudaDeviceSynchronize - blocks only GPU for the host (CPU) thread in which it is called, or does it CUDA Programming and Performance	3	4181	January 12, 2014
Multiple GPUs and streams CUDA Programming and Performance	4	4593	December 18, 2008

Question about the order of call function in GPU

Related topics