Superscaling model RCAN not benchmarked when trtexec is used


I took a general RCAN model from from and converted it to ONNX and simplified the onnx file.
Then I converted the onnx to trt using the following command
sudo /usr/src/tensorrt/bin/trtexec --explicitBatch --onnx=rcan_simplified.onnx --shapes=‘sr_input:0’:1x3x360x640 --int8 --workspace=12000 --saveEngine=rcan.trt

Cuda out memory warning comes and in the end results come as throughput=0
[10/29/2020-14:33:41] [I] Average on 10 runs - GPU latency: 467.399 ms - Host latency: 468.52 ms (end to end 669.498 ms)
[10/29/2020-14:33:41] [I] Host latency
[10/29/2020-14:33:41] [I] min: 463.218 ms (end to end 665.385 ms)
[10/29/2020-14:33:41] [I] max: 471.827 ms (end to end 672.859 ms)
[10/29/2020-14:33:41] [I] mean: 468.52 ms (end to end 669.498 ms)
[10/29/2020-14:33:41] [I] median: 468.875 ms (end to end 669.798 ms)
[10/29/2020-14:33:41] [I] percentile: 471.827 ms at 99% (end to end 672.859 ms at 99%)
[10/29/2020-14:33:41] [I] throughput: 0 qps
[10/29/2020-14:33:41] [I] walltime: 4.87749 s
[10/29/2020-14:33:41] [I] GPU Compute
[10/29/2020-14:33:41] [I] min: 462.029 ms
[10/29/2020-14:33:41] [I] max: 470.729 ms
[10/29/2020-14:33:41] [I] mean: 467.399 ms
[10/29/2020-14:33:41] [I] median: 467.717 ms
[10/29/2020-14:33:41] [I] percentile: 470.729 ms at 99%
[10/29/2020-14:33:41] [I] total compute time: 4.67399 s
&&&& PASSED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --explicitBatch --onnx=rcan_simplified.onnx --shapes=‘sr_input:0’:1x3x360x640 --int8 --workspace=12000 --saveEngine=rcan.trt

Kindly do suggest a solution from your end.

The logs and model files are shared below.


TensorRT Version:
GPU Type: T4
Nvidia Driver Version:440+
CUDA Version: 10.2
Operating System + Version:18.04

Relevant Files

The logs and model files are shared below.

Hi Can anybody help???

Hi @GalibaSashi,
Can you please try running your model once with the latest TRT release.