when should cudaThreadSynchronize() be called?

Hi, when should cudaThreadSynchronize() be called and what is this function doing?

In the code below, there are three cudaThreadSynchronize() calls. Which of them are redundant? Which of them are necessary? Does not the kernel functions call cudaThreadSynchronize() automatically?

for(int i=0; i<NumberOfSimualtion; i++)

{

	for(int j=0; j<NumberOfSteps; j++)

	{

		 kernel1<<<60,32>>>(a,b,c);

		 cudaThreadSynchronize(); 

		 kernel2<<<60,32>>>(d,e,f);

		  cudaThreadSynchronize(); 

	 }

}

cudaThreadSynchronize(); 

printf("Done");

Thanks

Hi, when should cudaThreadSynchronize() be called and what is this function doing?

In the code below, there are three cudaThreadSynchronize() calls. Which of them are redundant? Which of them are necessary? Does not the kernel functions call cudaThreadSynchronize() automatically?

for(int i=0; i<NumberOfSimualtion; i++)

{

	for(int j=0; j<NumberOfSteps; j++)

	{

		 kernel1<<<60,32>>>(a,b,c);

		 cudaThreadSynchronize(); 

		 kernel2<<<60,32>>>(d,e,f);

		  cudaThreadSynchronize(); 

	 }

}

cudaThreadSynchronize(); 

printf("Done");

Thanks

The first two are redundant, as kernels in the same stream are launched sequentially anyway, and no code is executed on the CPU between the kernels.

The third call does serve a purpose, without it the “Done” message would appear before execution is actually completed.

The first two are redundant, as kernels in the same stream are launched sequentially anyway, and no code is executed on the CPU between the kernels.

The third call does serve a purpose, without it the “Done” message would appear before execution is actually completed.

Thanks. Very helpful.

Thanks. Very helpful.