I’m performing some performance test on my cuda app.
There’s an intentional leak of information about the role of the
In the guide I found something like
“Blocks until the device has completed all preceding requested tasks.”
But I would to know exactly HOW the block is realized?
I’ve some ideas but I don’t know which of those could be the most plausible:
- the kernel unlocks the synchronization function
- the synchronization function polls some condition flag (software or hardware)
- the synchronization function sleeps in a queue of “ready job” inside the gpu scheduler data structure