I have a question regarding the latency to read/write in pinned memory. I have a kernel inside a for loop that used a pinned memory variable as argument. The kernels execution is serialized using streams and events. Therefore, the kernel i+1 only starts when the kernel i is finished (using StreamWaitEvent(streams[i],events[i-1]). The pinned variable is updated by the kernel and used as initial condition for the next kernel. Since I am waiting for the kernel i to be finished before launching the kernel i, I expected that the variable would be updated. It seems that this is not the case.
Does anyone have an idea of the problem? It seems that the latency to copy the data is quite important. Or the second possibility that I see is that all the kernels load their variable at the initiation and wait until the “go” signal from the StreamWaitEvent command.
Thanks for your help,