Error in export a DetectNet_v2 model in INT8 mode

Hello, I tried to export a DetectNet_v2 model in INT8 mode to get calibration.bin
but I got this error:

!tlt-export $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights/resnet18_detector.tlt \

            -o $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector_INT8.etlt \

            --outputs output_cov/Sigmoid,output_bbox/BiasAdd \

            -k $KEY \

            --input_dims 3,720,1280 \

            --max_workspace_size 1100000 \

            --export_module detectnet_v2 \

            --cal_data_file $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.tensor \

            --data_type int8 \

            --batches 10 \

            --cal_cache_file $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.bin \

            --cal_batch_size 4 \

            --verbose

Using TensorFlow backend.
2019-11-27 11:13:41,293 [INFO] iva.common.magnet_export: Loading model from /workspace/tlt-experiments/experiment_dir_unpruned/weights/resnet18_detector.tlt
2019-11-27 11:13:41.294458: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-11-27 11:13:41.338340: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-27 11:13:41.338832: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5f9ce60 executing computations on platform CUDA. Devices:
2019-11-27 11:13:41.338857: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): GeForce GTX 950M, Compute Capability 5.0
2019-11-27 11:13:41.359972: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2593905000 Hz
2019-11-27 11:13:41.361109: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x6460fd0 executing computations on platform Host. Devices:
2019-11-27 11:13:41.361142: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-11-27 11:13:41.361341: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: GeForce GTX 950M major: 5 minor: 0 memoryClockRate(GHz): 1.124
pciBusID: 0000:0a:00.0
totalMemory: 3.95GiB freeMemory: 3.69GiB
2019-11-27 11:13:41.361372: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-11-27 11:13:41.520957: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-11-27 11:13:41.521002: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-11-27 11:13:41.521012: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-11-27 11:13:41.521151: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3448 MB memory) -> physical GPU (device: 0, name: GeForce GTX 950M, pci bus id: 0000:0a:00.0, compute capability: 5.0)
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2019-11-27 11:13:48,201 [WARNING] tensorflow: From /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2019-11-27 11:14:05.070637: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-11-27 11:14:05.070736: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-11-27 11:14:05.070775: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-11-27 11:14:05.070790: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-11-27 11:14:05.070916: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3448 MB memory) -> physical GPU (device: 0, name: GeForce GTX 950M, pci bus id: 0000:0a:00.0, compute capability: 5.0)
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/tools/freeze_graph.py:249: __init__ (from tensorflow.python.platform.gfile) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.gfile.GFile.
2019-11-27 11:14:07,686 [WARNING] tensorflow: From /usr/local/lib/python2.7/dist-packages/tensorflow/python/tools/freeze_graph.py:249: __init__ (from tensorflow.python.platform.gfile) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.gfile.GFile.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/tools/freeze_graph.py:127: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
2019-11-27 11:14:08,725 [WARNING] tensorflow: From /usr/local/lib/python2.7/dist-packages/tensorflow/python/tools/freeze_graph.py:127: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
2019-11-27 11:14:09.033192: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-11-27 11:14:09.033272: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-11-27 11:14:09.033296: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-11-27 11:14:09.033316: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-11-27 11:14:09.033405: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3448 MB memory) -> physical GPU (device: 0, name: GeForce GTX 950M, pci bus id: 0000:0a:00.0, compute capability: 5.0)
INFO:tensorflow:Restoring parameters from /tmp/tmpPkudrd.ckpt
2019-11-27 11:14:09,179 [INFO] tensorflow: Restoring parameters from /tmp/tmpPkudrd.ckpt
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/tools/freeze_graph.py:232: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.graph_util.convert_variables_to_constants
2019-11-27 11:14:09,434 [WARNING] tensorflow: From /usr/local/lib/python2.7/dist-packages/tensorflow/python/tools/freeze_graph.py:232: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.graph_util.convert_variables_to_constants
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/graph_util_impl.py:245: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.graph_util.extract_sub_graph
2019-11-27 11:14:09,435 [WARNING] tensorflow: From /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/graph_util_impl.py:245: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.graph_util.extract_sub_graph
INFO:tensorflow:Froze 130 variables.
2019-11-27 11:14:09,554 [INFO] tensorflow: Froze 130 variables.
INFO:tensorflow:Converted 130 variables to const ops.
2019-11-27 11:14:09,600 [INFO] tensorflow: Converted 130 variables to const ops.
WARNING: The version of TensorFlow installed on this system is not guaranteed to work with UFF.
2019-11-27 11:14:40,368 [INFO] iva.common.magnet_export: Calibrating the exported model. Please don't panic as this may take a while.
2019-11-27 11:14:40,368 [ERROR] modulus.export._tensorrt: Specified INT8 but not supported on platform.
Traceback (most recent call last):
  File "/usr/local/bin/tlt-export", line 10, in <module>
    sys.exit(main())
  File "./common/magnet_export.py", line 206, in main
  File "./common/magnet_export.py", line 491, in magnet_export
  File "./modulus/export/_tensorrt.py", line 515, in __init__
  File "./modulus/export/_tensorrt.py", line 385, in __init__
AttributeError: Specified INT8 but not supported on platform.

here is my tlt-export code

!tlt-export $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights/resnet18_detector.tlt \
            -o $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector_INT8.etlt \
            --outputs output_cov/Sigmoid,output_bbox/BiasAdd \
            -k $KEY \
            --input_dims 3,720,1280 \
            --max_workspace_size 1100000 \
            --export_module detectnet_v2 \
            --cal_data_file $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.tensor \
            --data_type int8 \
            --batches 10 \
            --cal_cache_file $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.bin \
            --cal_batch_size 4 \
            --verbose

Hi m.billson16,
You are running with GTX950M according to the log.

name: GeForce GTX 950M major: 5 minor: 0 memoryClockRate(GHz): 1.124

But unfortunately the GTX950M can not support INT8 operation.
Its CUDA Compute Capability is 5.0

Hello Morganh, Thank you very much for your help.
So can I still deploy tlt-converter for deepstream? Since in the guide, we need INT8 to generate calibration.bin to run tlt-converter. Do you have any idea? Should I downgrade my NVIDIA GPU Drivers?

Can you try other precision mode(fp16 or fp32)? Please refer to the process in https://devtalk.nvidia.com/default/topic/1065558/transfer-learning-toolkit/trt-engine-deployment/

Hello Morganh, I have want to try another precision mode, like fp16. But when I run tlt-converter on my laptop. I got this error

[ERROR] runtime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[ERROR] runtime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[ERROR] Unable to create engine
Segmentation fault (core dumped)

Do you have any idea?

this is my tlt-converter code

!tlt-converter $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt \
               -k $KEY \
               -o output_cov/Sigmoid,output_bbox/BiasAdd \
               -d 3,720,1280 \
               -i nchw \
               -m 64 \
               -t fp16 \
               -e $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.engine \

Firstly, please check below requirements of TLT according to https://docs.nvidia.com/metropolis/TLT/tlt-getting-started-guide/index.html#requirements

Hardware Requirements

Minimum

4 GB system RAM
4 GB of GPU RAM
Single core CPU
1 GPU
50 GB of HDD space

Recommended

32 GB system RAM
32 GB of GPU RAM
8 core CPU
4 GPUs
100 GB of SSD space

Software Requirements
Ubuntu 18.04 LTS
NVIDIA GPU Cloud account and API key - https://ngc.nvidia.com/
docker-ce installed, https://docs.docker.com/install/linux/docker-ce/ubuntu/
nvidia-docker2 installed, instructions: https://github.com/nvidia/nvidia-docker/wiki/Installation-(version-2.0)
NVIDIA GPU driver v410.xx or above
Note: DeepStream 4.0 - NVIDIA SDK for IVA inference https://developer.nvidia.com/deepstream-sdk is recommended.

More, you can try to add “-w” argument into command line. For example, add " -w 50000000 " in the end.

Below is the help of tlt-converter.

$ ./tlt-converter -h
usage: ./tlt-converter [-h] [-v] [-e ENGINE_FILE_PATH]
        [-k ENCODE_KEY] [-c CACHE_FILE]
        [-o OUTPUTS] [-d INPUT_DIMENSIONS]
        [-b BATCH_SIZE] [-m MAX_BATCH_SIZE]
        [-w MAX_WORKSPACE_SIZE] [-t DATA_TYPE]
        [-i INPUT_ORDER]
        input_file

Generate TensorRT engine from exported model

positional arguments:
  input_file            Input file (.etlt exported model).

required flag arguments:
  -d            comma separated list of input dimensions
  -k            model encoding key

optional flag arguments:
  -b            calibration batch size (default 8)
  -c            calibration cache file (default cal.bin)
  -e            file the engine is saved to (default saved.engine)
  -i            input dimension ordering -- nchw, nhwc, nc (default nchw)
  -m            maximum TensorRT engine batch size (default 16)
  -o            comma separated list of output node names (default none)
  -t            TensorRT data type -- fp32, fp16, int8 (default fp32)
  -w            maximum workspace size of TensorRT engine (default 1<<30)

I also have the original issue:

 Using TensorFlow backend.
2020-09-23 08:51:26.633200: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-09-23 08:51:30.285351: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-09-23 08:51:30.285616: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 08:51:30.286311: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:00:04.0
2020-09-23 08:51:30.286347: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-09-23 08:51:30.286405: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-09-23 08:51:30.288083: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-09-23 08:51:30.288180: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-09-23 08:51:30.290270: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-09-23 08:51:30.292446: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-09-23 08:51:30.292570: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-09-23 08:51:30.292742: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 08:51:30.293500: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 08:51:30.294110: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-09-23 08:51:30.294161: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-09-23 08:51:31.215543: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-23 08:51:31.215601: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 
2020-09-23 08:51:31.215620: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N 
2020-09-23 08:51:31.215878: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 08:51:31.216597: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 08:51:31.217281: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 08:51:31.217940: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14649 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0)
2020-09-23 08:51:36.679384: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 08:51:36.680205: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:00:04.0
2020-09-23 08:51:36.680286: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-09-23 08:51:36.680365: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-09-23 08:51:36.680388: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-09-23 08:51:36.680409: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-09-23 08:51:36.680429: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-09-23 08:51:36.680449: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-09-23 08:51:36.680469: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-09-23 08:51:36.680590: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 08:51:36.681309: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 08:51:36.681962: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-09-23 08:51:36.682012: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-23 08:51:36.682026: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 
2020-09-23 08:51:36.682060: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N 
2020-09-23 08:51:36.682182: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 08:51:36.682920: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 08:51:36.683519: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14649 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0)
2020-09-23 08:51:39,975 [DEBUG] iva.common.export.base_exporter: Saving etlt model file at: /workspace/tlt-experiments/classification/export/final_model.etlt.
2020-09-23 08:51:43,403 [DEBUG] modulus.export._uff: Patching keras BatchNormalization...
2020-09-23 08:51:43,404 [DEBUG] modulus.export._uff: Patching keras Dropout...
2020-09-23 08:51:43,404 [DEBUG] modulus.export._uff: Patching UFF TensorFlow converter apply_fused_padding...
2020-09-23 08:51:44.491266: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 08:51:44.491965: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:00:04.0
2020-09-23 08:51:44.492020: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-09-23 08:51:44.492074: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-09-23 08:51:44.492099: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-09-23 08:51:44.492119: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-09-23 08:51:44.492139: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-09-23 08:51:44.492159: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-09-23 08:51:44.492179: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-09-23 08:51:44.492274: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 08:51:44.492927: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 08:51:44.493513: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-09-23 08:51:44.493551: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-23 08:51:44.493565: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 
2020-09-23 08:51:44.493573: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N 
2020-09-23 08:51:44.493679: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 08:51:44.494327: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 08:51:44.494892: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14649 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0)
2020-09-23 08:51:44,982 [DEBUG] modulus.export._uff: Unpatching keras BatchNormalization layer...
2020-09-23 08:51:44,982 [DEBUG] modulus.export._uff: Unpatching keras Dropout layer...
2020-09-23 08:51:47.347804: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 08:51:47.348577: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:00:04.0
2020-09-23 08:51:47.348681: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-09-23 08:51:47.348784: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-09-23 08:51:47.348816: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-09-23 08:51:47.348840: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-09-23 08:51:47.348876: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-09-23 08:51:47.348921: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-09-23 08:51:47.348962: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-09-23 08:51:47.349078: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 08:51:47.349887: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 08:51:47.350605: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-09-23 08:51:47.351039: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 08:51:47.351824: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:00:04.0
2020-09-23 08:51:47.351878: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-09-23 08:51:47.351939: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-09-23 08:51:47.351973: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-09-23 08:51:47.351997: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-09-23 08:51:47.352020: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-09-23 08:51:47.352044: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-09-23 08:51:47.352068: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-09-23 08:51:47.352178: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 08:51:47.352989: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 08:51:47.353676: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-09-23 08:51:47.353750: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-23 08:51:47.353792: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 
2020-09-23 08:51:47.353813: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N 
2020-09-23 08:51:47.353950: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 08:51:47.354775: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 08:51:47.355454: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14649 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0)
NOTE: UFF has been tested with TensorFlow 1.14.0.
WARNING: The version of TensorFlow installed on this system is not guaranteed to work with UFF.
DEBUG: convert reshape to flatten node
DEBUG [/usr/local/lib/python3.6/dist-packages/uff/converters/tensorflow/converter.py:96] Marking ['predictions/Softmax'] as outputs
2020-09-23 08:51:48,949 [DEBUG] iva.common.export.base_exporter: Reading input dims from tensorfile.
2020-09-23 08:51:48,949 [DEBUG] modulus.export.data: Opening /workspace/tlt-experiments/classification/export/calibration.tensor with mode=r
2020-09-23 08:51:49,201 [DEBUG] iva.common.export.base_exporter: Input dims: (3, 224, 224)
2020-09-23 08:51:49,225 [DEBUG] modulus.export.data: Opening /workspace/tlt-experiments/classification/export/calibration.tensor with mode=r
2020-09-23 08:51:49,226 [INFO] iva.common.export.base_exporter: Calibration takes time especially if number of batches is large.
2020-09-23 08:51:49,227 [ERROR] modulus.export._tensorrt: Specified INT8 but not supported on platform.
Traceback (most recent call last):
  File "/usr/local/bin/tlt-export", line 8, in <module>
    sys.exit(main())
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/export/app.py", line 185, in main
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/export/app.py", line 263, in run_export
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/export/base_exporter.py", line 505, in export
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/core/build_wheel.runfiles/ai_infra/moduluspy/modulus/export/_tensorrt.py", line 676, in __init__
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/core/build_wheel.runfiles/ai_infra/moduluspy/modulus/export/_tensorrt.py", line 469, in __init__
AttributeError: Specified INT8 but not supported on platform.

Using and NVIDIA P100… I think V100 is the only supported platform for INT8 quantification, as stated in the requirements.

V100 is not the only platform supporting int8.
Please see https://developer.nvidia.com/cuda-gpus#compute and https://docs.nvidia.com/deeplearning/tensorrt/support-matrix/index.html#hardware-precision-matrix