NCHWToNCHHW2 error while running a model with two output blobs

I have a modified version of GoogleNet, which has two output blobs instead of one.
I had been trying to profile this network making use of https://github.com/dusty-nv/jetson-inference link.

For the benchmarking I made use of ‘trt-bench’ executable. I was able to get a speed of 5ms for original GoogleNet provided along with the git repo.
I was also able to get 5ms for my network if only one of my output blob name is mentioned.

When I try to give a vector of output blob names, it compiles correctly as the code supports it.
But I get this error:

[TRT]  reformat.cu (1036) - Cuda Error in NCHWToNCHHW2: 4
[TRT]  reformat.cu (1036) - Cuda Error in NCHWToNCHHW2: 4
[cuda]   cudaStreamSynchronize(stream)
[cuda]      unspecified launch failure (error 4) (hex 0x04)
[cuda]      /app/jetson-inference/imageNet.cpp:319
[TRT]  imageNet::Process() -- failed to enqueue TensorRT network
GPU network failed to process

Can someone guide me on what I am doing wrong. Any insights will be extremely helpful.

Thank you.

Hi,

You should modify the output buffer for your customized design.
Check this code first:
https://github.com/dusty-nv/jetson-inference/blob/master/tensorNet.cpp#L326

Thanks.