Kernel and wait events

pierre_daye · September 16, 2011, 8:25pm

Hi everybody,

First of all, I am new to this forum, so please excuse my lack of knowledge!

I try to use asynchronous copy and wait events and I have a lot of trouble to understand the behavior of my function.

I declare a pinned memory using:

float *Neurons_device,*Neurons_host;

cudaSetDeviceFlags(cudaDeviceMapHost);

// Allocation of the memory on the host

cudaHostAlloc((float **)&Neurons_host, sizeNETWORK, cudaHostAllocMapped);

// The device pointer

cudaHostGetDevicePointer(&Neurons_device, Neurons_host, 0);

Then, I used the following code to call my kernels:

NetNeurons<<<1,NNEUR,0,streams[0]>>>(Neurons_device,d_OUT,0);

cudaMemcpyAsync(OUT_TOT, d_OUT, bytes, cudaMemcpyDeviceToHost, streams[0]);

for (i=1;i<NBLOCKS;i++){

 NetNeurons<<<1,NNEUR,0,streams[i]>>> (Neurons_device,d_OUT+(i*NNEUR*TOTSTEP),i);

 cudaEventRecord(kernelEvent[i], streams[i]);

 cudaStreamWaitEvent(streams[i], kernelEvent[i-1],0);

}

for (i=1;i<NBLOCKS;i++) cudaMemcpyAsync(OUT_TOT+(i*NNEUR*TOTSTEP), d_OUT+(i*NNEUR*TOTSTEP), bytes, cudaMemcpyDeviceToHost, streams[i]);

cudaEventSynchronize(kernelEvent[NBLOCKS]);

In my kernel, (NetNeurons), the variable Neurons_device loaded, I do some computations and change the value of Neurons_device for the next execution of the same kernel. I expected the two commands cudaEventRecord(kernelEvent[i], streams[i]); and cudaStreamWaitEvent(streams[i], kernelEvent[i-1],0); to generate the serialization of the call of the kernel (stream[1] then stream [2] then…). I also believe (but apparently, this is not the case!) that the next kernel would read its inputs when the previous one as recorded an event. What I observed is that the next kernel does not used the updated version of Neurons_device, which are the initial conditions for the next call.

here is the code of the kernel. The synctreads are not useful for the moment but will be in the near future!

__global__ void NetNeurons(float *IN,float*OUT,int iBlock)

{

	int idx=threadIdx.x;

	int i,j;

	float yout[NVarNEUR];

	float yin[NVarNEUR];

	float inputs[2];

	inputs[0]=0.0;

	inputs[1]=0.0;

	//__shared__ float Vtmp[NNEUR]; 

	//if (idx==0) for (i=0;i<NNEUR;i++) Vtmp[i]=IN.elements[i*NVarNEUR];

	for (i=0;i<NVarNEUR;i++) yout[i]=IN[idx*NVarNEUR+i];

	OUT[idx]=IN[idx*NVarNEUR];

	__syncthreads();

	for (j=1;j<TOTSTEP;j++){

		if (((j+iBlock*(TOTSTEP))<((int) 500/DT) )&((j+iBlock*(TOTSTEP))>=((int) 100/DT)))//<<<<<<--------

    		inputs[0]=0.05*yout[0];

    		else

    		inputs[0]=0.0;

		for (i=0;i<NVarNEUR;i++) yin[i]=yout[i];

		simpr(yin, yout, inputs);

		__syncthreads();

		//Vtmp[idx]=yout[0];

		OUT[idx+j*NNEUR]=yout[0];

		__syncthreads();

	}

	for (i=0;i<NVarNEUR;i++) IN[idx*NVarNEUR+i]=yout[i];

	__syncthreads();

}

I hope I am clear… If someone has an idea, it is most welcome!

Thank you,

Pierre

Topic		Replies	Views
Does cudaStreamWaitEvent(stream2, event1, 0) also block the stream to record event1? CUDA Programming and Performance	3	357	May 24, 2024
cuStreamWaitEvent using cuStreamWaitEvent with memcopies and kernel launches CUDA Programming and Performance	4	2344	November 19, 2011
Waiting on events that haven't been recorded on cuda streams CUDA Programming and Performance cuda	5	178	July 22, 2025
Event Synchronization CUDA Programming and Performance	6	2008	February 8, 2019
Cuda 4 inter-GPU synchronization ? CUDA Programming and Performance	5	1127	April 5, 2011
Fail to sync the cudaMemcpyAsync using the cudaEvent in two streams CUDA Programming and Performance	4	310	April 1, 2024
cudaEventSync with cudaMemcpyAsync_Host2Device CUDA Programming and Performance	5	548	September 20, 2021
Syncronization with cuda Streams CUDA Programming and Performance cuda	8	520	October 12, 2021
How to make a kernel's execution wait for a signal from another thread CUDA Programming and Performance	4	144	October 28, 2024
sanity check: when do I need to synchronize kernel launches? CUDA Programming and Performance	1	567	February 2, 2018

Kernel and wait events

Related topics