cudaDeviceSynchronize() returns cudaErrorMemoryAllocation

mutti.stefano.jp · February 2, 2018, 3:22pm

I am trying to call cudaDeviceSynchronize() in a kernel, with dynamic parallelism, in order to wait for the child kernels to be finished before going on with the parent kernels, this way:

global father (…){
. . .
. . .
__syncthreads();

if(condition met) //DIVERGENCE
{
  
  size_t shrbytesants = VALUE

  child<<<1,n_pnts_ants,VALUE>>>(...);

  cudaDeviceSynchronize();
  cudaError err = cudaGetLastError();
  printf("error : %s \n",cudaGetErrorString( err ));

}

__syncthreads();
}

Whiteout the cudaDeviceSynchronize() the code works but the fathers do not wait for the childs to be finished.
If i put cudaDeviceSynchronize(), the code compiles fine and execute without apparent errors, but cuda-memcheck shows a ""Program hit cudaErrorMemoryAllocation (error 2) due to “out of memory” on CUDA API call to cudaLaunch. “”
i am not getting if i’m doing something wrong with stack, heap memory but the fact that without cudaDeviceSynchronize() the code works suggests me that memory is ok.
Thanks

Robert_Crovella · February 2, 2018, 5:00pm

I suggest you read the entire CDP section in the programming guide. There are a number of considerations here, especially synchronization depth. The use of cudaDeviceSynchronize in kernel affects the amount of state that is required to be kept, which affects memory required for the kernel launch.

I doubt there is anything useful that can be said about your incomplete snippet.

Topic		Replies	Views
Synchronization synchronizing a n body problem. CUDA Programming and Performance	8	4321	September 22, 2009
cudaDeviceSynchronize error CUDA Programming and Performance	2	3852	February 17, 2014
cudaThreadSynchronize() error CUDA Programming and Performance	1	2963	October 5, 2009
cudaThreadSynchronize usage CUDA Programming and Performance	3	2935	October 21, 2008
Program hangs at cudaThreadsynchronize CUDA Programming and Performance	12	9625	April 7, 2011
cuda crash without any message CUDA Programming and Performance	12	1685	January 22, 2015
incomprehensible behaviour limitations on kernel calls for host function? CUDA Programming and Performance	12	7046	April 28, 2011
Synchronization problem CUDA Programming and Performance	0	850	December 3, 2012
cudaDeviceSynchronize not returning error of type "invalid configuration argument" CUDA Programming and Performance	2	1697	March 12, 2014
cudaThreadSynchronize() stalls application CUDA Programming and Performance	10	11014	November 17, 2009

cudaDeviceSynchronize() returns cudaErrorMemoryAllocation

Related topics