I have a modified version of GoogleNet, which has two output blobs instead of one.
I had been trying to profile this network making use of GitHub - dusty-nv/jetson-inference: Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson. link.
For the benchmarking I made use of ‘trt-bench’ executable. I was able to get a speed of 5ms for original GoogleNet provided along with the git repo.
I was also able to get 5ms for my network if only one of my output blob name is mentioned.
When I try to give a vector of output blob names, it compiles correctly as the code supports it.
But I get this error:
[TRT] reformat.cu (1036) - Cuda Error in NCHWToNCHHW2: 4
[TRT] reformat.cu (1036) - Cuda Error in NCHWToNCHHW2: 4
[cuda] cudaStreamSynchronize(stream)
[cuda] unspecified launch failure (error 4) (hex 0x04)
[cuda] /app/jetson-inference/imageNet.cpp:319
[TRT] imageNet::Process() -- failed to enqueue TensorRT network
GPU network failed to process
Can someone guide me on what I am doing wrong. Any insights will be extremely helpful.
Thank you.