Tao-converter mask_rcnn int8 engine creation fails

Please provide the following information when requesting support.

• Hardware A40
• Network Type Mask_rcnn
• TLT Version tao-toolkit-tf:v3.21.08-py3
Deploying on tensorrt container 21.08, with opensource plugin install script
/opt/tensorrt/install_opensource.sh -b 21.08

peoplenetPruned.txt (2.0 KB)

• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

I am attempting to deploy a mask_rcnn “peopleSegnet” exported model using tao-converter. The task completes successfully with fp32 and fp16 but fails with int8. I use the following command:
./tao-converter -d 3,576,960 \

-k nvidia_tlt \

-o generate_detections,mask_fcn_logits/BiasAdd \

-c /workspace/bantam/peopleNet/100221Calibration.cache \

-e int8.engine \

-b 6 \

-m 6 \

-w 15000000000 \

-t int8 \

/workspace/bantam/peopleNet/pruned100221_exported.etlt

I get the following message:
[ERROR] 1: Unexpected exception std::bad_alloc
[ERROR] Unable to create engine
Segmentation fault (core dumped)

If I increase the workspace size to 25,000,000,000 I get this error:

[WARNING] Memory requirements of format conversion cannot be satisfied during timing, format rejected.
[WARNING] Internal error: cannot reformat, disabling format. Try decreasing the workspace size with IBuilderConfig::setMaxWorkspaceSize().

Can you try
$ ./tao-converter -k nvidia_tlt -d 3,576,960 -o generate_detections,mask_fcn_logits/BiasAdd -t int8 -c peoplesegnet_resnet50_int8.txt -m 1 -w 100000000 peoplesegnet_resnet50.etlt

I receive:

[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 2985, GPU 1582 (MiB)
[ERROR] 1: Unexpected exception std::bad_alloc
[ERROR] Unable to create engine
Segmentation fault (core dumped)

I am using tao-converter for CUDA 11.3 / cuDNN 8.1 / TensorRT 8.0 - is this correct for container nvcr.io/nvidia/tensorrt:21.08-py3?

So, you were downloading tao-converter inside container nvcr.io/nvidia/tensorrt:21.08-py3 and then generate trt engine?

yes

Can you run below inside nvcr.io/nvidia/tensorrt:21.08-py3 to check the cuda/trt/cudnn version?
$ dpkg -l |grep cuda

ii cuda-cccl-11-4 11.4.43-1 amd64 CUDA CCCL
ii cuda-compat-11-4 470.57.02-1 amd64 CUDA Compatibility Platform
ii cuda-cudart-11-4 11.4.108-1 amd64 CUDA Runtime native Libraries
ii cuda-cudart-dev-11-4 11.4.108-1 amd64 CUDA Runtime native dev links, headers
ii cuda-cuobjdump-11-4 11.4.43-1 amd64 CUDA cuobjdump
ii cuda-cupti-11-4 11.4.100-1 amd64 CUDA profiling tools runtime libs.
ii cuda-cupti-dev-11-4 11.4.100-1 amd64 CUDA profiling tools interface.
ii cuda-driver-dev-11-4 11.4.108-1 amd64 CUDA Driver native dev stub library
ii cuda-gdb-11-4 11.4.100-1 amd64 CUDA-GDB
ii cuda-memcheck-11-4 11.4.100-1 amd64 CUDA-MEMCHECK
ii cuda-nvcc-11-4 11.4.100-1 amd64 CUDA nvcc
ii cuda-nvdisasm-11-4 11.4.100-1 amd64 CUDA disassembler
ii cuda-nvml-dev-11-4 11.4.43-1 amd64 NVML native dev links, headers
ii cuda-nvprof-11-4 11.4.100-1 amd64 CUDA Profiler tools
ii cuda-nvprune-11-4 11.4.100-1 amd64 CUDA nvprune
ii cuda-nvrtc-11-3 11.3.109-1 amd64 NVRTC native runtime libraries
ii cuda-nvrtc-11-4 11.4.100-1 amd64 NVRTC native runtime libraries
ii cuda-nvrtc-dev-11-3 11.3.109-1 amd64 NVRTC native dev links, headers
ii cuda-nvrtc-dev-11-4 11.4.100-1 amd64 NVRTC native dev links, headers
ii cuda-nvtx-11-4 11.4.100-1 amd64 NVIDIA Tools Extension
ii cuda-sanitizer-11-4 11.4.108-1 amd64 CUDA Sanitizer
ii cuda-toolkit-11-4-config-common 11.4.108-1 all Common config package for CUDA Toolkit 11.4.
ii cuda-toolkit-11-config-common 11.4.108-1 all Common config package for CUDA Toolkit 11.
ii cuda-toolkit-config-common 11.4.108-1 all Common config package for CUDA Toolkit.
ii libcudnn8 8.2.2.26-1+cuda11.4 amd64 cuDNN runtime libraries
ii libcudnn8-dev 8.2.2.26-1+cuda11.4 amd64 cuDNN development libraries and headers
ii libnccl-dev 2.10.3-1+cuda11.4 amd64 NVIDIA Collective Communication Library (NCCL) Development Files
ii libnccl2 2.10.3-1+cuda11.4 amd64 NVIDIA Collective Communication Library (NCCL) Runtime
ii libnvinfer-bin 8.0.1-1+cuda11.3 amd64 TensorRT binaries
ii libnvinfer-dev 8.0.1-1+cuda11.3 amd64 TensorRT development libraries and headers
ii libnvinfer-plugin-dev 8.0.1-1+cuda11.3 amd64 TensorRT plugin libraries and headers
ii libnvinfer-plugin8 8.0.1-1+cuda11.3 amd64 TensorRT plugin libraries
ii libnvinfer8 8.0.1-1+cuda11.3 amd64 TensorRT runtime libraries
ii libnvonnxparsers-dev 8.0.1-1+cuda11.3 amd64 TensorRT ONNX libraries
ii libnvonnxparsers8 8.0.1-1+cuda11.3 amd64 TensorRT ONNX libraries
ii libnvparsers-dev 8.0.1-1+cuda11.3 amd64 TensorRT parsers libraries
ii libnvparsers8 8.0.1-1+cuda11.3 amd64 TensorRT parsers libraries

Where did you download tao-converter? Can you share the link?

from the page TAO Toolkit Get Started | NVIDIA Developer
I used https://developer.nvidia.com/tao-converter-80

How about generating fp16 trt engine? Is it successful?

More, please also try the official demo models mentioned in GitHub - NVIDIA-AI-IOT/deepstream_tao_apps at release/tao3.0 .

FP16 and FP32 generate successfully and provide accurate inference.