Hi,
I am using detectnet-camera.cpp to convert my fine-tuned Caffe DetectNet model into a TensorRT model.
I know when I use my caffe model w/ NVcaffe, the input blob size specified in deploy.prototext is irrelevant since the input blob is reshaped to whatever the actual input is (e.g. here ).
However, when the caffemodel is converted to TensorRT, the deploy.prototext input blob size makes a huge difference in terms of inference speed. I know this sense the default FaceNet network operates at 45 FPS at its default size of 450x450, but if this is changes to 1920x1250 it goes down to 8 FPS! Another wierd thing is that the actual input frame size from the camera is 1280x720 not 450x450.
So my question is:
- How does TensorRT use the the deploy.prototext input size specifications?
- If an image with a different size is fed into that network, what happens?
Hi,
Incidentally i m also trying to run similar experiments and have exactly same questions.
I use the executable giexec that comes with TensorRT bin .
I took pretrained Googlenet which is trained with 224x224 , modify the input dimensions to 1080x2048. Purpose was to see the impact input resolution on the overall efficiency and throughput. I get an error the inference engine could not be built. I see this error for any dimension other than the 224x224 for which the network is trained.
This is noticed for Resnet-50/18 too.
However when i repeat the same experiment with Mobilenet it works till batch 16 on P4 and batch 32 on V100.
With higher resolution image there is drop in throughput(Images/Sec) as expected.
One reason for higher batches failing could be that the system is constrained by memory or element count (not sure).
I m still not able to nail down the reason why lower batch size (1,2,4 …) fails for Googlenet or Resnet.
Any help to understand the behavior would be highly appreciated
Cheers !