Error in export a DetectNet_v2 model in INT8 mode

m.billson16 · November 27, 2019, 11:19am

Hello, I tried to export a DetectNet_v2 model in INT8 mode to get calibration.bin
but I got this error:

!tlt-export $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights/resnet18_detector.tlt \

            -o $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector_INT8.etlt \

            --outputs output_cov/Sigmoid,output_bbox/BiasAdd \

            -k $KEY \

            --input_dims 3,720,1280 \

            --max_workspace_size 1100000 \

            --export_module detectnet_v2 \

            --cal_data_file $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.tensor \

            --data_type int8 \

            --batches 10 \

            --cal_cache_file $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.bin \

            --cal_batch_size 4 \

            --verbose

Using TensorFlow backend.
2019-11-27 11:13:41,293 [INFO] iva.common.magnet_export: Loading model from /workspace/tlt-experiments/experiment_dir_unpruned/weights/resnet18_detector.tlt
2019-11-27 11:13:41.294458: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-11-27 11:13:41.338340: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-27 11:13:41.338832: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5f9ce60 executing computations on platform CUDA. Devices:
2019-11-27 11:13:41.338857: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): GeForce GTX 950M, Compute Capability 5.0
2019-11-27 11:13:41.359972: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2593905000 Hz
2019-11-27 11:13:41.361109: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x6460fd0 executing computations on platform Host. Devices:
2019-11-27 11:13:41.361142: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-11-27 11:13:41.361341: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: GeForce GTX 950M major: 5 minor: 0 memoryClockRate(GHz): 1.124
pciBusID: 0000:0a:00.0
totalMemory: 3.95GiB freeMemory: 3.69GiB
2019-11-27 11:13:41.361372: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-11-27 11:13:41.520957: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-11-27 11:13:41.521002: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-11-27 11:13:41.521012: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-11-27 11:13:41.521151: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3448 MB memory) -> physical GPU (device: 0, name: GeForce GTX 950M, pci bus id: 0000:0a:00.0, compute capability: 5.0)
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2019-11-27 11:13:48,201 [WARNING] tensorflow: From /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2019-11-27 11:14:05.070637: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-11-27 11:14:05.070736: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-11-27 11:14:05.070775: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-11-27 11:14:05.070790: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-11-27 11:14:05.070916: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3448 MB memory) -> physical GPU (device: 0, name: GeForce GTX 950M, pci bus id: 0000:0a:00.0, compute capability: 5.0)
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/tools/freeze_graph.py:249: __init__ (from tensorflow.python.platform.gfile) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.gfile.GFile.
2019-11-27 11:14:07,686 [WARNING] tensorflow: From /usr/local/lib/python2.7/dist-packages/tensorflow/python/tools/freeze_graph.py:249: __init__ (from tensorflow.python.platform.gfile) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.gfile.GFile.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/tools/freeze_graph.py:127: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
2019-11-27 11:14:08,725 [WARNING] tensorflow: From /usr/local/lib/python2.7/dist-packages/tensorflow/python/tools/freeze_graph.py:127: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
2019-11-27 11:14:09.033192: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-11-27 11:14:09.033272: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-11-27 11:14:09.033296: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-11-27 11:14:09.033316: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-11-27 11:14:09.033405: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3448 MB memory) -> physical GPU (device: 0, name: GeForce GTX 950M, pci bus id: 0000:0a:00.0, compute capability: 5.0)
INFO:tensorflow:Restoring parameters from /tmp/tmpPkudrd.ckpt
2019-11-27 11:14:09,179 [INFO] tensorflow: Restoring parameters from /tmp/tmpPkudrd.ckpt
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/tools/freeze_graph.py:232: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.graph_util.convert_variables_to_constants
2019-11-27 11:14:09,434 [WARNING] tensorflow: From /usr/local/lib/python2.7/dist-packages/tensorflow/python/tools/freeze_graph.py:232: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.graph_util.convert_variables_to_constants
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/graph_util_impl.py:245: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.graph_util.extract_sub_graph
2019-11-27 11:14:09,435 [WARNING] tensorflow: From /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/graph_util_impl.py:245: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.graph_util.extract_sub_graph
INFO:tensorflow:Froze 130 variables.
2019-11-27 11:14:09,554 [INFO] tensorflow: Froze 130 variables.
INFO:tensorflow:Converted 130 variables to const ops.
2019-11-27 11:14:09,600 [INFO] tensorflow: Converted 130 variables to const ops.
WARNING: The version of TensorFlow installed on this system is not guaranteed to work with UFF.
2019-11-27 11:14:40,368 [INFO] iva.common.magnet_export: Calibrating the exported model. Please don't panic as this may take a while.
2019-11-27 11:14:40,368 [ERROR] modulus.export._tensorrt: Specified INT8 but not supported on platform.
Traceback (most recent call last):
  File "/usr/local/bin/tlt-export", line 10, in <module>
    sys.exit(main())
  File "./common/magnet_export.py", line 206, in main
  File "./common/magnet_export.py", line 491, in magnet_export
  File "./modulus/export/_tensorrt.py", line 515, in __init__
  File "./modulus/export/_tensorrt.py", line 385, in __init__
AttributeError: Specified INT8 but not supported on platform.

here is my tlt-export code

!tlt-export $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights/resnet18_detector.tlt \
            -o $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector_INT8.etlt \
            --outputs output_cov/Sigmoid,output_bbox/BiasAdd \
            -k $KEY \
            --input_dims 3,720,1280 \
            --max_workspace_size 1100000 \
            --export_module detectnet_v2 \
            --cal_data_file $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.tensor \
            --data_type int8 \
            --batches 10 \
            --cal_cache_file $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.bin \
            --cal_batch_size 4 \
            --verbose

Morganh · November 27, 2019, 4:11pm

Hi m.billson16,
You are running with GTX950M according to the log.

name: GeForce GTX 950M major: 5 minor: 0 memoryClockRate(GHz): 1.124

But unfortunately the GTX950M can not support INT8 operation.
Its CUDA Compute Capability is 5.0

m.billson16 · November 27, 2019, 4:56pm

Hello Morganh, Thank you very much for your help.
So can I still deploy tlt-converter for deepstream? Since in the guide, we need INT8 to generate calibration.bin to run tlt-converter. Do you have any idea? Should I downgrade my NVIDIA GPU Drivers?

Morganh · November 27, 2019, 11:11pm

Can you try other precision mode(fp16 or fp32)? Please refer to the process in https://devtalk.nvidia.com/default/topic/1065558/transfer-learning-toolkit/trt-engine-deployment/

m.billson16 · November 28, 2019, 4:35am

Hello Morganh, I have want to try another precision mode, like fp16. But when I run tlt-converter on my laptop. I got this error

[ERROR] runtime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[ERROR] runtime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[ERROR] Unable to create engine
Segmentation fault (core dumped)

Do you have any idea?

this is my tlt-converter code

!tlt-converter $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt \
               -k $KEY \
               -o output_cov/Sigmoid,output_bbox/BiasAdd \
               -d 3,720,1280 \
               -i nchw \
               -m 64 \
               -t fp16 \
               -e $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.engine \

Morganh · November 28, 2019, 5:07am

Firstly, please check below requirements of TLT according to Integrating TAO Models into DeepStream — TAO Toolkit 3.22.05 documentation

Hardware Requirements
Minimum

4 GB system RAM
4 GB of GPU RAM
Single core CPU
1 GPU
50 GB of HDD space

Recommended

32 GB system RAM
32 GB of GPU RAM
8 core CPU
4 GPUs
100 GB of SSD space

Software Requirements
Ubuntu 18.04 LTS
NVIDIA GPU Cloud account and API key - https://ngc.nvidia.com/
docker-ce installed, Install Docker Engine on Ubuntu | Docker Documentation
nvidia-docker2 installed, instructions: Installation (version 2.0) · NVIDIA/nvidia-docker Wiki · GitHub
NVIDIA GPU driver v410.xx or above
Note: DeepStream 4.0 - NVIDIA SDK for IVA inference https://developer.nvidia.com/deepstream-sdk is recommended.

Morganh · November 28, 2019, 6:17am

More, you can try to add “-w” argument into command line. For example, add " -w 50000000 " in the end.

Below is the help of tlt-converter.

$ ./tlt-converter -h
usage: ./tlt-converter [-h] [-v] [-e ENGINE_FILE_PATH]
        [-k ENCODE_KEY] [-c CACHE_FILE]
        [-o OUTPUTS] [-d INPUT_DIMENSIONS]
        [-b BATCH_SIZE] [-m MAX_BATCH_SIZE]
        [-w MAX_WORKSPACE_SIZE] [-t DATA_TYPE]
        [-i INPUT_ORDER]
        input_file

Generate TensorRT engine from exported model

positional arguments:
  input_file            Input file (.etlt exported model).

required flag arguments:
  -d            comma separated list of input dimensions
  -k            model encoding key

optional flag arguments:
  -b            calibration batch size (default 8)
  -c            calibration cache file (default cal.bin)
  -e            file the engine is saved to (default saved.engine)
  -i            input dimension ordering -- nchw, nhwc, nc (default nchw)
  -m            maximum TensorRT engine batch size (default 16)
  -o            comma separated list of output node names (default none)
  -t            TensorRT data type -- fp32, fp16, int8 (default fp32)
  -w            maximum workspace size of TensorRT engine (default 1<<30)

motyaedu · September 23, 2020, 8:55am

I also have the original issue:

 Using TensorFlow backend.
2020-09-23 08:51:26.633200: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-09-23 08:51:30.285351: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-09-23 08:51:30.285616: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 08:51:30.286311: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:00:04.0
2020-09-23 08:51:30.286347: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-09-23 08:51:30.286405: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-09-23 08:51:30.288083: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-09-23 08:51:30.288180: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-09-23 08:51:30.290270: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-09-23 08:51:30.292446: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-09-23 08:51:30.292570: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-09-23 08:51:30.292742: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 08:51:30.293500: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 08:51:30.294110: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-09-23 08:51:30.294161: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-09-23 08:51:31.215543: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-23 08:51:31.215601: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 
2020-09-23 08:51:31.215620: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N 
2020-09-23 08:51:31.215878: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 08:51:31.216597: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 08:51:31.217281: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 08:51:31.217940: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14649 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0)
2020-09-23 08:51:36.679384: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 08:51:36.680205: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:00:04.0
2020-09-23 08:51:36.680286: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-09-23 08:51:36.680365: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-09-23 08:51:36.680388: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-09-23 08:51:36.680409: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-09-23 08:51:36.680429: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-09-23 08:51:36.680449: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-09-23 08:51:36.680469: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-09-23 08:51:36.680590: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 08:51:36.681309: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 08:51:36.681962: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-09-23 08:51:36.682012: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-23 08:51:36.682026: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 
2020-09-23 08:51:36.682060: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N 
2020-09-23 08:51:36.682182: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 08:51:36.682920: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 08:51:36.683519: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14649 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0)
2020-09-23 08:51:39,975 [DEBUG] iva.common.export.base_exporter: Saving etlt model file at: /workspace/tlt-experiments/classification/export/final_model.etlt.
2020-09-23 08:51:43,403 [DEBUG] modulus.export._uff: Patching keras BatchNormalization...
2020-09-23 08:51:43,404 [DEBUG] modulus.export._uff: Patching keras Dropout...
2020-09-23 08:51:43,404 [DEBUG] modulus.export._uff: Patching UFF TensorFlow converter apply_fused_padding...
2020-09-23 08:51:44.491266: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 08:51:44.491965: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:00:04.0
2020-09-23 08:51:44.492020: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-09-23 08:51:44.492074: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-09-23 08:51:44.492099: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-09-23 08:51:44.492119: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-09-23 08:51:44.492139: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-09-23 08:51:44.492159: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-09-23 08:51:44.492179: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-09-23 08:51:44.492274: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 08:51:44.492927: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 08:51:44.493513: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-09-23 08:51:44.493551: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-23 08:51:44.493565: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 
2020-09-23 08:51:44.493573: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N 
2020-09-23 08:51:44.493679: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 08:51:44.494327: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 08:51:44.494892: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14649 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0)
2020-09-23 08:51:44,982 [DEBUG] modulus.export._uff: Unpatching keras BatchNormalization layer...
2020-09-23 08:51:44,982 [DEBUG] modulus.export._uff: Unpatching keras Dropout layer...
2020-09-23 08:51:47.347804: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 08:51:47.348577: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:00:04.0
2020-09-23 08:51:47.348681: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-09-23 08:51:47.348784: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-09-23 08:51:47.348816: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-09-23 08:51:47.348840: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-09-23 08:51:47.348876: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-09-23 08:51:47.348921: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-09-23 08:51:47.348962: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-09-23 08:51:47.349078: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 08:51:47.349887: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 08:51:47.350605: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-09-23 08:51:47.351039: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 08:51:47.351824: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:00:04.0
2020-09-23 08:51:47.351878: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-09-23 08:51:47.351939: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-09-23 08:51:47.351973: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-09-23 08:51:47.351997: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-09-23 08:51:47.352020: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-09-23 08:51:47.352044: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-09-23 08:51:47.352068: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-09-23 08:51:47.352178: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 08:51:47.352989: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 08:51:47.353676: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-09-23 08:51:47.353750: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-23 08:51:47.353792: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 
2020-09-23 08:51:47.353813: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N 
2020-09-23 08:51:47.353950: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 08:51:47.354775: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 08:51:47.355454: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14649 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0)
NOTE: UFF has been tested with TensorFlow 1.14.0.
WARNING: The version of TensorFlow installed on this system is not guaranteed to work with UFF.
DEBUG: convert reshape to flatten node
DEBUG [/usr/local/lib/python3.6/dist-packages/uff/converters/tensorflow/converter.py:96] Marking ['predictions/Softmax'] as outputs
2020-09-23 08:51:48,949 [DEBUG] iva.common.export.base_exporter: Reading input dims from tensorfile.
2020-09-23 08:51:48,949 [DEBUG] modulus.export.data: Opening /workspace/tlt-experiments/classification/export/calibration.tensor with mode=r
2020-09-23 08:51:49,201 [DEBUG] iva.common.export.base_exporter: Input dims: (3, 224, 224)
2020-09-23 08:51:49,225 [DEBUG] modulus.export.data: Opening /workspace/tlt-experiments/classification/export/calibration.tensor with mode=r
2020-09-23 08:51:49,226 [INFO] iva.common.export.base_exporter: Calibration takes time especially if number of batches is large.
2020-09-23 08:51:49,227 [ERROR] modulus.export._tensorrt: Specified INT8 but not supported on platform.
Traceback (most recent call last):
  File "/usr/local/bin/tlt-export", line 8, in <module>
    sys.exit(main())
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/export/app.py", line 185, in main
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/export/app.py", line 263, in run_export
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/export/base_exporter.py", line 505, in export
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/core/build_wheel.runfiles/ai_infra/moduluspy/modulus/export/_tensorrt.py", line 676, in __init__
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/core/build_wheel.runfiles/ai_infra/moduluspy/modulus/export/_tensorrt.py", line 469, in __init__
AttributeError: Specified INT8 but not supported on platform.

Using and NVIDIA P100… I think V100 is the only supported platform for INT8 quantification, as stated in the requirements.

Morganh · September 23, 2020, 9:38am

V100 is not the only platform supporting int8.
Please see https://developer.nvidia.com/cuda-gpus#compute and Support Matrix :: NVIDIA Deep Learning TensorRT Documentation

Topic		Replies	Views
Poor Result After INT8 Optimization (TLT Getting Started Guide) TAO Toolkit	32	1633	October 12, 2021
TLT YOLOv3 Int8 can not detect anything TAO Toolkit	17	1783	October 12, 2021
Errors: tlt-export TLT YOLO model to INT8 calibration TAO Toolkit tensorrt , yolo	8	1191	October 12, 2021
Convert SSD-mobilenetv2 to int8 DeepStream SDK	8	950	October 12, 2021
TAO 21.11 detectnet_v2 fallback to fp16 in DS6 TAO Toolkit tao , deepstream	8	2219	January 25, 2022
Errors in generating int8 googlenet.etlt and googlenet.bin files from googlenet.tlt model with tlt-export command TAO Toolkit	1	557	October 12, 2021
The tlt-converter does not work well with TensorRT 6 (Jetson TX2) TAO Toolkit	7	839	October 12, 2021
Issue running tlt trained SSD-resnet18 on Xavier with deepstream-app TAO Toolkit	15	762	October 12, 2021
TX2 "INT8 not supported by platform. Trying FP16 mode" TAO Toolkit	11	2849	October 12, 2021
Cannot convert FasterRCNN TLT model to trt engine TAO Toolkit	9	1172	October 12, 2021

Error in export a DetectNet_v2 model in INT8 mode

Related topics