I have a modified version of GoogleNet, which has two output blobs instead of one.
I had been trying to profile this network making use of https://github.com/dusty-nv/jetson-inference link.
For the benchmarking I made use of ‘trt-bench’ executable. I was able to get a speed of 5ms for original GoogleNet provided along with the git repo.
I was also able to get 5ms for my network if only one of my output blob name is mentioned.
When I try to give a vector of output blob names, it compiles correctly as the code supports it.
But I get this error:
[TRT] reformat.cu (1036) - Cuda Error in NCHWToNCHHW2: 4 [TRT] reformat.cu (1036) - Cuda Error in NCHWToNCHHW2: 4 [cuda] cudaStreamSynchronize(stream) [cuda] unspecified launch failure (error 4) (hex 0x04) [cuda] /app/jetson-inference/imageNet.cpp:319 [TRT] imageNet::Process() -- failed to enqueue TensorRT network GPU network failed to process
Can someone guide me on what I am doing wrong. Any insights will be extremely helpful.