Hello, I use nvprof to profiling my executable program, but I find the cudaStreamCreateWithFlags
and cudaFree
are very time-consuming:
API calls:
58.56% 273.858s 16 17.1161s 16.875us 136.929s cudaStreamCreateWithFlags
33.46% 156.448s 17 9.20280s 924ns 74.3984s cudaFree
...
Is this phenomenon is normal?