I am trying to use tlt-converter to convert the model using QAT Workflow specified in DetectNet_v2 example of this doc - https://docs.nvidia.com/metropolis/TLT/tlt-getting-started-guide/text/overview.html
Here is what I used and am using -
For training, I used
Cloud Environment: Google Cloud and deployed this container (https://ngc.nvidia.com/catalog/containers/nvidia:tlt-streamanalytics)
Cloud Hardware Setup: NVIDIA GPU (I believe it is T4). 64GB RAM. 1TB Storage
Local Setup: Xavier NX
Clock Mode: Mode 15W 6Core
I was able to go through the entire steps of 11.QAT workflow. After that, I grabbed the exported model and moved it to Xavier NX to run the tlt-converter, I got this error (the error happened even on just regular int8 mode using Step 10) -
[INFO] Reading Calibration Cache for calibrator: EntropyCalibration2 [INFO] Generated calibration scales using calibration cache. Make sure that calibration cache has latest scales. [INFO] To regenerate calibration cache, please delete the existing one. TensorRT will generate a new calibration cache. [WARNING] Missing dynamic range for tensor output_bbox/BiasAdd, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [WARNING] Missing dynamic range for tensor output_cov/BiasAdd, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [WARNING] Missing dynamic range for tensor output_cov/Sigmoid, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [INFO] [INFO] --------------- Layers running on DLA: [INFO] [INFO] --------------- Layers running on GPU: [INFO] conv1/convolution + activation_1/Relu6, block_1a_conv_1/convolution + block_1a_relu_1/Relu6, block_1a_conv_2/convolution, block_1a_conv_shortcut/convolution + add_1/add + block_1a_relu/Relu6, block_1b_conv_1/convolution + block_1b_relu_1/Relu6, block_1b_conv_2/convolution + add_2/add + block_1b_relu/Relu6, block_2a_conv_1/convolution + block_2a_relu_1/Relu6, block_2a_conv_2/convolution, block_2a_conv_shortcut/convolution + add_3/add + block_2a_relu/Relu6, block_2b_conv_1/convolution + block_2b_relu_1/Relu6, block_2b_conv_2/convolution + add_4/add + block_2b_relu/Relu6, block_3a_conv_1/convolution + block_3a_relu_1/Relu6, block_3a_conv_2/convolution, block_3a_conv_shortcut/convolution + add_5/add + block_3a_relu/Relu6, block_3b_conv_1/convolution + block_3b_relu_1/Relu6, block_3b_conv_2/convolution + add_6/add + block_3b_relu/Relu6, block_4a_conv_1/convolution + block_4a_relu_1/Relu6, block_4a_conv_2/convolution, block_4a_conv_shortcut/convolution + add_7/add + block_4a_relu/Relu6, block_4b_conv_1/convolution + block_4b_relu_1/Relu6, block_4b_conv_2/convolution + add_8/add + block_4b_relu/Relu6, output_cov/convolution, output_cov/Sigmoid, output_bbox/convolution, [INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output. Killed
The same error popped up when I was doing tlt-converter on Step 9B (on Google Cloud) shown in the Jupyter notebook, but it worked without a problem. However on NX, I couldn’t really figure out how to make it work or what is causing this issue.
Would appreciate any guidance/help.