ONNX to TRT using trtexec gives output only on batch size 1

Description

Hi everyone,
1) I used the command /usr/src/tensorrt/bin/trtexec --onnx=edsr_simplified.onnx --shapes='sr_input:0':2x3x640x360 --workspace=10000
It throws the following error:
[10/14/2020-13:48:00] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[10/14/2020-13:48:00] [W] [TRT] onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[10/14/2020-13:49:17] [E] [TRT] …/builder/cudnnBuilderUtils.cpp (372) - Cuda Error in findFastestTactic: 700 (an illegal memory access was encountered)
[10/14/2020-13:49:17] [E] [TRT] …/rtSafe/safeRuntime.cpp (32) - Cuda Error in free: 700 (an illegal memory access was encountered)
terminate called after throwing an instance of ‘nvinfer1::CudaError’
** what(): std::exception**
Aborted (core dumped)

2) Also even though it works for /usr/src/tensorrt/bin/trtexec --onnx=edsr_simplified.onnx --shapes='sr_input:0':1x3x640x360 --workspace=10000
The throughput value is given as zero as shown below.

[10/14/2020-13:44:04] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[10/14/2020-13:44:08] [I] Warmup completed 0 queries over 200 ms
[10/14/2020-13:44:08] [I] Timing trace has 0 queries over 3.36078 s
[10/14/2020-13:44:08] [I] Trace averages of 10 runs:
[10/14/2020-13:44:08] [I] Average on 10 runs - GPU latency: 128.417 ms - Host latency: 129.52 ms (end to end 256.644 ms)
[10/14/2020-13:44:08] [I] Average on 10 runs - GPU latency: 130.264 ms - Host latency: 131.368 ms (end to end 258.942 ms)
[10/14/2020-13:44:08] [I] Host latency
[10/14/2020-13:44:08] [I] min: 124.672 ms (end to end 246.965 ms)
[10/14/2020-13:44:08] [I] max: 147.879 ms (end to end 287.596 ms)
[10/14/2020-13:44:08] [I] mean: 130.536 ms (end to end 258.53 ms)
[10/14/2020-13:44:08] [I] median: 127.569 ms (end to end 253.264 ms)
[10/14/2020-13:44:08] [I] percentile: 147.879 ms at 99% (end to end 287.596 ms at 99%)
[10/14/2020-13:44:08] [I] throughput: 0 qps
[10/14/2020-13:44:08] [I] walltime: 3.36078 s
[10/14/2020-13:44:08] [I] GPU Compute
[10/14/2020-13:44:08] [I] min: 123.57 ms
[10/14/2020-13:44:08] [I] max: 146.762 ms
[10/14/2020-13:44:08] [I] mean: 129.433 ms
[10/14/2020-13:44:08] [I] median: 126.466 ms
[10/14/2020-13:44:08] [I] percentile: 146.762 ms at 99%
[10/14/2020-13:44:08] [I] total compute time: 3.23583 s

Environment

TensorRT Version: 7.0.0.11
GPU Type: T4
Nvidia Driver Version: 440+
CUDA Version: 10.2
Operating System + Version:18.04

Hi @GalibaSashi,
Request you to share your model and the script, so that we can help you better.

Thanks!

Hi @AakankshaS,
Shall I DM you the model.Should I send the tensorflow model or converted onnx model or both.

Hi @AakankshaS I have sent you the model can you check??

Hi I have sent you the model can you please check??

Hi @GalibaSashi,
I could reproduce the issue.
The team is looking into this.
Please allow us some time.
Thanks!

Hi @AakankshaS,
Any comments on the same??

Hi @GalibaSashi,
Can you please try running your model with the latest TRT release?

Thanks!