SSD network: trtexec with batch=2 on XAvier

dror.m · August 21, 2019, 2:01pm

Hello,

We are using O.D. network of SSD512 and we trying to decide will it work faster for us when using batch=2 over the classic computation of single image at a time.
So we made a test on a desktop machine with 1080Ti, and we saw that the execution time for batch=2 is 160% of the time that takes for a single image. The input tensor was [N=2,C=3,H=512,W=512]
Assuming the execution time of two images, one by one, takes 200% of the time of single image, it means that batch=2 consumes less time per image on average, to be precise: 80% of the original time, those saving 20% of the execution time. This looks promising…

We tried to do the same test on Xavier, but when converting the “.uff” file from the previous test with “trtexec” it crashed after several minutes of building the computational graph with the error:
trtexec: trtexec.cpp:360: void createMemory(const nvinfer1::ICudaEngine&, std::vector<void*>&, const string&): Assertion `(bindingIndex < (int) buffers.size()) && “Input/output name not found in network”’ failed.
So I took the original SSD512 “.uff” file that uses single image as an input [N=???,C=3,H=512,W=512] and run “trtexec” on it with batch=2. The execution time was almost 200% of the execution time of single image, which means that for XAvier, using batch=2 gives no advantage.

We aren’t using any profiling tool because we investigating the “neto” execution time of TensorRT engine, which is a “black box” from our point of view as a users of TensorRT.

Our questions are:

Did we do something wrong in this execution runtime test?
Is there a way to overcome the crash in the ".uff" convert to "plan" file?

Thanks!

AastaLLL · August 29, 2019, 8:20am

Hi,

Sorry for the late update.
Is there any plugin implementation in your experiment?

In general, SSD requires a plugin implementation named as FlattenConcat.
It should not work with trtexec since the pluing requirement from the SSD model.

If your model is fully supported by the TensorRT, would you mind to share it with us for debugging.

Thanks.

zohar37bw9 · September 8, 2019, 10:53am

Hello all

While waiting for assistance, we started to explore new directions for overcoming this obstacle. We initiated a test with ONNX file format (instead of UFF) and noticed it might be better for our use.

As a result, it’s possible that the answer above will not be relevant at all.
In case we will drop this new direction (working with ONNX) we will turn to the forum and resume the correspondence.

Thank you