How to create a lock shared by the host and the kernel?

Hi,
I wonder if there is an interface to create such a lock, which can be simultaneously shared by the host and the device, so that they can synchronize with each other?

Hi,

Would you mind sharing more of your use case?
In general, host can call cudaDeviceSynchronize() to make sure all the GPU task is finished.

Thanks.

Thanks for the reply,
Actually, I hope my program could be strictly in this sequence:
for(int i = 0; i < 100; ++i) {
cpu_do_something();
gpu_do_something();
}
However, I don’t want to launch a new kernel in each iteration, since the GPU’s cache data seems to be invalidated between two kernel launches, and I hope to keep the kernel utilizing its cache all the way to get better performance. Is it possible to achieve?

Hi,

Since Jetson doesn’t support concurrent access, it’s essential to make sure the previous process job is finished.
Is it possible to move the loop into the cpu_do_something and gpu_do_something?

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.