Why cudaStreamCreateWithFlags and cudaFree are very time-consuming?

Hello, I use nvprof to profiling my executable program, but I find the cudaStreamCreateWithFlags and cudaFree are very time-consuming:

API calls:   
58.56%  273.858s        16  17.1161s  16.875us  136.929s  cudaStreamCreateWithFlags
33.46%  156.448s        17  9.20280s     924ns  74.3984s  cudaFree
...

Is this phenomenon is normal?