My application is a windows multithreaded cuda application which loades multiple cuda modules , multiple dnn models (caffe and dlib), ffmpeg with cuvid decoding and nvenc.
Application gpu memory usage is about 4GB.
When I try to run it with NSight the gpu memory exceeds 11GB which causes the application to crash.
The only way I can use Nsight is by disabling some of the DNN models loading but then I cannot get a complete view on my application behaviour.
Why is the memory usage increases so much when using Nsight?
I there a way to limit it?
I solved the issue.
It seems that caffe implementation of cudnn layers is allocating cudnn handle and cuda stream for each layer instead of sharing the cudnn handle between layers.
This caused a few thousands streams to be created when loading the networks. Nsight crashes when allocating so many streams.
It’s great your problem solved.
Thanks for the update.