TensorRT's nvinfer1::INetworkDefinition::addFullyConnected() does not work as expected for C3D network

Hi,

Sorry for the late.
Do you mean the memory for TensorRT API doesn’t occupy the similar amount of memory?
If yes, would you mind to share a sample for TensorRT API vs. cuDNN API with us?

Please noted that the library is not loaded when engine building time.
It will be loaded until inferencing.

Thanks.

Hello,

My inference code was pasted above on 15/Oct, and to trigger cudnn to be used, as your example code , I only added :

cudnnHandle_t handle_;
cudnnCreate(&handle_);

then I saw program took much more memory, the increased memory amounted to 700+ M.

Thanks.

Hi,

Yes, the memory is used for loading cuDNN memory and it takes 600+ memory.

It should be similar in TensorRT.
For the layer with cuDNN implementation, the same library will be loaded when inferencing.

Thanks.

But while in doing inference with my network implemented with TensorRT API, I didn’t find memory occupation increased largely, I observed with jtop.
Besides coding network with TensorRT API, is there anything that needs to be configured ? thanks.

Hi,

Could you run the nvprof for your implementation to see the detailed backend API first?

$ sudo /usr/local/cuda-10.2/bin/nvprof [your app]

Although TensorRT leverage cuDNN basically, some operations might use other library instead.
Here is my profiling result for the sample_mnist and you can see it mainly use the cuBLAS(gemm) and cuDNN:

==20592== Profiling result:
            Type  Time(%)      Time     Calls       Avg       Min       Max  Name
 GPU activities:   11.73%  4.9390ms       296  16.685us     448ns  194.47us  [CUDA memcpy HtoD]
                   10.66%  4.4887ms       149  30.125us     416ns  330.29us  [CUDA memset]
                    3.77%  1.5858ms         8  198.23us  151.85us  247.60us  trt_volta_sgemm_128x128_relu_nn_v1
                    3.66%  1.5408ms        23  66.990us  14.368us  143.85us  void cudnn::cnn::conv2d_grouped_direct_kernel<float, float, float, float, float, float, bool=1, bool=0, int=0, int=0, int=0>(cudnnTensorStruct, float const *, cudnnFilterStruct, float const *, cudnnConvolutionStruct, cudnn::cnn::conv2d_grouped_direct_kernel<float, float, float, float, float, float, bool=1, bool=0, int=0, int=0, int=0>, float*, float, float*, cudnn::reduced_divisor, float, float, float, float, int, cudnnConvolutionStruct const *, float const *, cudnnActivationStruct)
                    3.11%  1.3091ms         8  163.63us  83.940us  245.68us  trt_volta_sgemm_64x64_relu_nn_v1
...

Thanks.

Hello,
I had a try to run nvprof with our app on Jetson Nano on which our app often runs, didn’t get the output that you showed, but errors happened, please see the messages in the picture.
Thanks.

Hi,

Could you run the app (./bright) as root but without using nvprof?
It might be some issue if you launch deepstream as root but remotely since it usually requires a DISPLAY connection.

Thanks.

Hello, we often run ./bright well by root user, crash was never seen.
But, if running ./bright with nvprof, bright always ran for a while and GUI was seen and object detection also was working, then app crashed and the error was that in the screen-shot picture.
Just now I had a try locally on Jetson Nano board, also got the same result, please see the attached image, thanks.

Hi,

Unfortunately, it seems there is some issue in the nvprof of CUDA 10.2.
As alternative, would you mind to check if you can run nsight system for the app?

Thanks

Hello,
I have been trying to download nsight installer deb package by the latest sdkmanager, but sdkmanager always could not get the configuration file from your cloud server in Step2, I think it is not because of my network problem as I could access google, twitter, youtube, etc… I’ll update you someday once your server gets to work fine.