Hi, I have some questions about trtexec.
- I’m confused with the option named ‘useSpinWait’.
It is written here that if I use cudaEventBlockingSync, CPU thread will busy-wait, that is, spin-wait.
But in trtexec.cpp, it is written like this :
unsigned int cudaEventFlags = gParams.useSpinWait ? cudaEventDefault : cudaEventBlockingSync;
So, we have to use ‘useSpinWait’ option to avoid busy-waiting.
Could you explain me about this…?
- Why do I have to use “cudaEventDefault” to achieve expected sum of performance?
(It is written here : https://docs.nvidia.com/jetson/jetpack/release-notes/index.html#early-access-notes)
Could you please explain why using both DLA and GPU makes difference if I use cudaEventBlockingSync…?
And how cudaEventDefault resolves this problem??