Hi, I have some questions about trtexec.
- I’m confused with the option named ‘useSpinWait’.
It is written here that if I use cudaEventBlockingSync, CPU thread will busy-wait, that is, spin-wait.
[url]https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__EVENT.html#group__CUDART__EVENT_1g949aa42b30ae9e622f6ba0787129ff22[/url]
But in trtexec.cpp, it is written like this :
unsigned int cudaEventFlags = gParams.useSpinWait ? cudaEventDefault : cudaEventBlockingSync;
So, we have to use ‘useSpinWait’ option to avoid busy-waiting.
Could you explain me about this…?
- Why do I have to use “cudaEventDefault” to achieve expected sum of performance?
(It is written here : [url]https://docs.nvidia.com/jetson/jetpack/release-notes/index.html#early-access-notes[/url])
Could you please explain why using both DLA and GPU makes difference if I use cudaEventBlockingSync…?
And how cudaEventDefault resolves this problem??
Thank you.