A small model costs long time

Please provide the following info (check/uncheck the boxes after creating this topic):
Software Version
DRIVE OS Linux 5.2.6
DRIVE OS Linux 5.2.0
[y] DRIVE OS Linux 5.2.0 and DriveWorks 3.5
NVIDIA DRIVE™ Software 10.0 (Linux)
NVIDIA DRIVE™ Software 9.0 (Linux)
other DRIVE OS version
other

Target Operating System
[y] Linux
QNX
other

Hardware Platform
[y] NVIDIA DRIVE™ AGX Xavier DevKit (E3550)
NVIDIA DRIVE™ AGX Pegasus DevKit (E3550)
other

SDK Manager Version
[y] 1.6.1.8175
1.6.0.8170
other

Host Machine Version
[y] native Ubuntu 18.04
other

singlehead_pfe_40000.onnx (4.2 KB)

Hello, this is a small model. but it costs 82ms in Xavier in fp16 and 79ms in int8 type.
Could you give me some explanations about this model?

Hi,

It seems that your model is created with batchsize=4000.
Input size is [4000,32,10] and the output is [4000,1,64].

Do you have any cross-batch operations?
If not, would you mind generating a batchsize=1 model and share the performance?

Thanks.

So, the 40000 is defined as batchsize not channels?
I will create 1x40000x32x10 to test.

Hi,

Yes. We expect the model to be NCHW or NHWC format.
The first dimension is used for batchsize.

Thanks.

Hi,
Thank you very much. I get it.