SSD Model, TRT, DeepStream, Triton: SegFault

Hi!

I’m trying to export this TF model to TF-TRT, and run it with DeepStream NvInferServer i.e. Triton.

Platform: Jetson TX2.

The model I tried:
ssd_mobilenet_v1_coco_2018_01_28.tar.gz

The code to export the models to TF-TRT:
Tensorflow TensorRT Integration

Environment to export the models:
l4t-tensorflow:r32.6.1-tf2.5-py3

Environment to run the models:
deepstream-l4t:6.0-triton

When I start the deepstream-app, you can see the log below.
I tried to debug with gdb, and the segfault seems to occur at the libtriton.so

root@host:/opt/nvidia/deepstream/deepstream-6.0/projects/project/branch/apps/app# deepstream-app -c config.txt
(Argus) Error FileOperationFailed: Connecting to nvargus-daemon failed: No such file or directory (in src/rpc/socket/client/SocketClientDispatch.cpp, function openSocketConnection(), line 205)
(Argus) Error FileOperationFailed: Cannot create camera provider (in src/rpc/socket/client/SocketClientDispatch.cpp, function createCameraProvider(), line 106)

(gst-plugin-scanner:27): GStreamer-WARNING **: 00:00:43.783: Failed to load plugin '/usr/lib/aarch64-linux-gnu/gstreamer-1.0/deepstream/libnvdsgst_udp.so': librivermax.so.0: cannot open shared object file: No such file or directory

(deepstream-app:26): GLib-GObject-WARNING **: 00:00:44.137: g_object_set_is_valid_property: object class 'GstNvInferServer' has no property named 'input-tensor-meta'
0:00:01.573802028    26     0x1ae40360 WARN           nvinferserver gstnvinferserver_impl.cpp:284:validatePluginConfig:<primary_gie> warning: Configuration file batch-size reset to: 1
WARNING: backend.trt_is is deprecated. updated it to backend.triton
I1214 00:00:44.312613 26 shared_library.cc:108] OpenLibraryHandle: /opt/nvidia/deepstream/deepstream-6.0/lib/triton_backends/tensorflow1/libtriton_tensorflow1.so
2021-12-14 00:00:45.076182: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
I1214 00:00:45.486738 26 tensorflow.cc:2171] TRITONBACKEND_Initialize: tensorflow
I1214 00:00:45.486797 26 tensorflow.cc:2184] Triton TRITONBACKEND API version: 1.4
I1214 00:00:45.486829 26 tensorflow.cc:2190] 'tensorflow' TRITONBACKEND API version: 1.4
I1214 00:00:45.486853 26 tensorflow.cc:2211] backend configuration:
{"cmdline":{"allow-soft-placement":"true","gpu-memory-fraction":"0.250000"}}
I1214 00:00:45.487114 26 shared_library.cc:108] OpenLibraryHandle: /opt/nvidia/deepstream/deepstream-6.0/lib/triton_backends/onnxruntime/libtriton_onnxruntime.so
I1214 00:00:45.540140 26 onnxruntime.cc:1972] TRITONBACKEND_Initialize: onnxruntime
I1214 00:00:45.540347 26 onnxruntime.cc:1985] Triton TRITONBACKEND API version: 1.4
I1214 00:00:45.540481 26 onnxruntime.cc:1991] 'onnxruntime' TRITONBACKEND API version: 1.4
I1214 00:00:45.576490 26 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x101070000' with size 67108864
I1214 00:00:45.576725 26 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I1214 00:00:45.578204 26 backend_factory.h:45] Create TritonBackendFactory
I1214 00:00:45.578280 26 plan_backend_factory.cc:49] Create PlanBackendFactory
I1214 00:00:45.578306 26 plan_backend_factory.cc:56] Registering TensorRT Plugins
I1214 00:00:45.578357 26 logging.cc:52] Registered plugin creator - ::GridAnchor_TRT version 1
I1214 00:00:45.578391 26 logging.cc:52] Registered plugin creator - ::GridAnchorRect_TRT version 1
I1214 00:00:45.578424 26 logging.cc:52] Registered plugin creator - ::NMS_TRT version 1
I1214 00:00:45.578455 26 logging.cc:52] Registered plugin creator - ::Reorg_TRT version 1
I1214 00:00:45.578488 26 logging.cc:52] Registered plugin creator - ::Region_TRT version 1
I1214 00:00:45.578517 26 logging.cc:52] Registered plugin creator - ::Clip_TRT version 1
I1214 00:00:45.578549 26 logging.cc:52] Registered plugin creator - ::LReLU_TRT version 1
I1214 00:00:45.578604 26 logging.cc:52] Registered plugin creator - ::PriorBox_TRT version 1
I1214 00:00:45.578628 26 logging.cc:52] Registered plugin creator - ::Normalize_TRT version 1
I1214 00:00:45.578648 26 logging.cc:52] Registered plugin creator - ::ScatterND version 1
I1214 00:00:45.578670 26 logging.cc:52] Registered plugin creator - ::RPROI_TRT version 1
I1214 00:00:45.578708 26 logging.cc:52] Registered plugin creator - ::BatchedNMS_TRT version 1
I1214 00:00:45.578729 26 logging.cc:52] Registered plugin creator - ::BatchedNMSDynamic_TRT version 1
I1214 00:00:45.578751 26 logging.cc:52] Registered plugin creator - ::FlattenConcat_TRT version 1
I1214 00:00:45.578774 26 logging.cc:52] Registered plugin creator - ::CropAndResize version 1
I1214 00:00:45.578803 26 logging.cc:52] Registered plugin creator - ::DetectionLayer_TRT version 1
I1214 00:00:45.578834 26 logging.cc:52] Registered plugin creator - ::EfficientNMS_ONNX_TRT version 1
I1214 00:00:45.578864 26 logging.cc:52] Registered plugin creator - ::EfficientNMS_TRT version 1
I1214 00:00:45.578894 26 logging.cc:52] Registered plugin creator - ::Proposal version 1
I1214 00:00:45.578920 26 logging.cc:52] Registered plugin creator - ::ProposalLayer_TRT version 1
I1214 00:00:45.578941 26 logging.cc:52] Registered plugin creator - ::PyramidROIAlign_TRT version 1
I1214 00:00:45.578973 26 logging.cc:52] Registered plugin creator - ::ResizeNearest_TRT version 1
I1214 00:00:45.579003 26 logging.cc:52] Registered plugin creator - ::Split version 1
I1214 00:00:45.579027 26 logging.cc:52] Registered plugin creator - ::SpecialSlice_TRT version 1
I1214 00:00:45.579065 26 logging.cc:52] Registered plugin creator - ::InstanceNormalization_TRT version 1
I1214 00:00:45.579094 26 ensemble_backend_factory.cc:47] Create EnsembleBackendFactory
I1214 00:00:45.579179 26 server.cc:504] 
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I1214 00:00:45.579309 26 server.cc:543] 
+-------------+------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------+
| Backend     | Path                                                                                           | Config                                                                       |
+-------------+------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------+
| tensorrt    | <built-in>                                                                                     | {}                                                                           |
| tensorflow  | /opt/nvidia/deepstream/deepstream-6.0/lib/triton_backends/tensorflow1/libtriton_tensorflow1.so | {"cmdline":{"allow-soft-placement":"true","gpu-memory-fraction":"0.250000"}} |
| onnxruntime | /opt/nvidia/deepstream/deepstream-6.0/lib/triton_backends/onnxruntime/libtriton_onnxruntime.so | {}                                                                           |
+-------------+------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------+

I1214 00:00:45.579354 26 model_repository_manager.cc:570] BackendStates()
I1214 00:00:45.579394 26 server.cc:586] 
+-------+---------+--------+
| Model | Version | Status |
+-------+---------+--------+
+-------+---------+--------+

I1214 00:00:45.579632 26 tritonserver.cc:1718] 
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                               |
| server_version                   | 2.13.0                                                                                                                                                               |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tens |
|                                  | or_data statistics                                                                                                                                                   |
| model_repository_path[0]         | /opt/nvidia/deepstream/deepstream-6.0/projects/project/myproject/models                                                                                              |
| model_control_mode               | MODE_EXPLICIT                                                                                                                                                        |
| strict_model_config              | 1                                                                                                                                                                    |
| pinned_memory_pool_byte_size     | 67108864                                                                                                                                                             |
| cuda_memory_pool_byte_size{0}    | 67108864                                                                                                                                                             |
| min_supported_compute_capability | 5.3                                                                                                                                                                  |
| strict_readiness                 | 1                                                                                                                                                                    |
| exit_timeout                     | 30                                                                                                                                                                   |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I1214 00:00:45.579711 26 model_repository_manager.cc:570] BackendStates()
I1214 00:00:45.579758 26 model_repository_manager.cc:638] GetInferenceBackend() 'ssd_mobilenet_v1_trt' version 5
I1214 00:00:45.584830 26 model_repository_manager.cc:749] AsyncLoad() 'ssd_mobilenet_v1_trt'
I1214 00:00:45.585101 26 model_repository_manager.cc:988] TriggerNextAction() 'ssd_mobilenet_v1_trt' version 5: 1
I1214 00:00:45.585153 26 model_repository_manager.cc:1026] Load() 'ssd_mobilenet_v1_trt' version 5
I1214 00:00:45.585180 26 model_repository_manager.cc:1045] loading: ssd_mobilenet_v1_trt:5
I1214 00:00:45.686111 26 model_repository_manager.cc:1105] CreateInferenceBackend() 'ssd_mobilenet_v1_trt' version 5
I1214 00:00:45.689946 26 tensorflow.cc:2273] TRITONBACKEND_ModelInitialize: ssd_mobilenet_v1_trt (version 5)
I1214 00:00:45.701077 26 model_config_utils.cc:1524] ModelConfig 64-bit fields:
I1214 00:00:45.701188 26 model_config_utils.cc:1526] 	ModelConfig::dynamic_batching::default_queue_policy::default_timeout_microseconds
I1214 00:00:45.701261 26 model_config_utils.cc:1526] 	ModelConfig::dynamic_batching::max_queue_delay_microseconds
I1214 00:00:45.701328 26 model_config_utils.cc:1526] 	ModelConfig::dynamic_batching::priority_queue_policy::value::default_timeout_microseconds
I1214 00:00:45.701369 26 model_config_utils.cc:1526] 	ModelConfig::ensemble_scheduling::step::model_version
I1214 00:00:45.701411 26 model_config_utils.cc:1526] 	ModelConfig::input::dims
I1214 00:00:45.701448 26 model_config_utils.cc:1526] 	ModelConfig::input::reshape::shape
I1214 00:00:45.701490 26 model_config_utils.cc:1526] 	ModelConfig::instance_group::secondary_devices::device_id
I1214 00:00:45.701531 26 model_config_utils.cc:1526] 	ModelConfig::model_warmup::inputs::value::dims
I1214 00:00:45.701569 26 model_config_utils.cc:1526] 	ModelConfig::optimization::cuda::graph_spec::graph_lower_bound::input::value::dim
I1214 00:00:45.701605 26 model_config_utils.cc:1526] 	ModelConfig::optimization::cuda::graph_spec::input::value::dim
I1214 00:00:45.701645 26 model_config_utils.cc:1526] 	ModelConfig::output::dims
I1214 00:00:45.701678 26 model_config_utils.cc:1526] 	ModelConfig::output::reshape::shape
I1214 00:00:45.701712 26 model_config_utils.cc:1526] 	ModelConfig::sequence_batching::direct::max_queue_delay_microseconds
I1214 00:00:45.701744 26 model_config_utils.cc:1526] 	ModelConfig::sequence_batching::max_sequence_idle_microseconds
I1214 00:00:45.701776 26 model_config_utils.cc:1526] 	ModelConfig::sequence_batching::oldest::max_queue_delay_microseconds
I1214 00:00:45.701807 26 model_config_utils.cc:1526] 	ModelConfig::version_policy::specific::versions
I1214 00:00:45.702544 26 tensorflow.cc:1427] model configuration:
{
    "name": "ssd_mobilenet_v1_trt",
    "platform": "tensorflow_savedmodel",
    "backend": "tensorflow",
    "version_policy": {
        "latest": {
            "num_versions": 1
        }
    },
    "max_batch_size": 1,
    "input": [
        {
            "name": "image_tensor",
            "data_type": "TYPE_UINT8",
            "format": "FORMAT_NONE",
            "dims": [
                300,
                300,
                3
            ],
            "is_shape_tensor": false,
            "allow_ragged_batch": false
        }
    ],
    "output": [
        {
            "name": "num_detections",
            "data_type": "TYPE_FP32",
            "dims": [
                1
            ],
            "reshape": {
                "shape": []
            },
            "label_filename": "",
            "is_shape_tensor": false
        },
        {
            "name": "detection_classes",
            "data_type": "TYPE_FP32",
            "dims": [
                100
            ],
            "label_filename": "",
            "is_shape_tensor": false
        },
        {
            "name": "detection_scores",
            "data_type": "TYPE_FP32",
            "dims": [
                100
            ],
            "label_filename": "",
            "is_shape_tensor": false
        },
        {
            "name": "detection_boxes",
            "data_type": "TYPE_FP32",
            "dims": [
                100,
                4
            ],
            "label_filename": "",
            "is_shape_tensor": false
        }
    ],
    "batch_input": [],
    "batch_output": [],
    "optimization": {
        "priority": "PRIORITY_DEFAULT",
        "execution_accelerators": {
            "gpu_execution_accelerator": [
                {
                    "name": "tensorrt",
                    "parameters": {
                        "max_workspace_size_bytes": "268435456",
                        "precision_mode": "FP16",
                        "minimum_segment_size": "10"
                    }
                }
            ],
            "cpu_execution_accelerator": []
        },
        "input_pinned_memory": {
            "enable": true
        },
        "output_pinned_memory": {
            "enable": true
        },
        "gather_kernel_buffer_threshold": 0,
        "eager_batching": false
    },
    "instance_group": [
        {
            "name": "ssd_mobilenet_v1_trt_0",
            "kind": "KIND_GPU",
            "count": 1,
            "gpus": [
                0
            ],
            "secondary_devices": [],
            "profile": [],
            "passive": false,
            "host_policy": ""
        }
    ],
    "default_model_filename": "model.savedmodel",
    "cc_model_filenames": {},
    "metric_tags": {},
    "parameters": {},
    "model_warmup": []
}
I1214 00:00:45.703470 26 tensorflow.cc:2323] TRITONBACKEND_ModelInstanceInitialize: ssd_mobilenet_v1_trt_0 (GPU device 0)
I1214 00:00:45.704920 26 backend_model_instance.cc:110] Creating instance ssd_mobilenet_v1_trt_0 on GPU 0 (6.2) using artifact 'model.savedmodel'
I1214 00:00:45.706709 26 tensorflow.cc:895] TensorRT Execution Accelerator is set for ssd_mobilenet_v1_trt
2021-12-14 00:00:45.706889: I tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /opt/nvidia/deepstream/deepstream-6.0/projects/project/myproject/models/ssd_mobilenet_v1_trt/5/model.savedmodel
2021-12-14 00:00:45.873394: I tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2021-12-14 00:00:46.073559: W tensorflow/core/platform/profile_utils/cpu_utils.cc:98] Failed to find bogomips in /proc/cpuinfo; cannot determine CPU frequency
2021-12-14 00:00:46.074131: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1ab389a0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-12-14 00:00:46.074232: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-12-14 00:00:46.074756: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-12-14 00:00:46.075018: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1049] ARM64 does not support NUMA - returning NUMA node zero
2021-12-14 00:00:46.075299: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1666] Found device 0 with properties: 
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3
pciBusID: 0000:00:00.0
2021-12-14 00:00:46.075501: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
2021-12-14 00:00:46.075735: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2021-12-14 00:00:46.078440: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-12-14 00:00:46.086887: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-12-14 00:00:46.101865: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-12-14 00:00:46.112217: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10
2021-12-14 00:00:46.112483: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-12-14 00:00:46.112644: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1049] ARM64 does not support NUMA - returning NUMA node zero
2021-12-14 00:00:46.112889: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1049] ARM64 does not support NUMA - returning NUMA node zero
2021-12-14 00:00:46.112989: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1794] Adding visible gpu devices: 0
2021-12-14 00:00:50.389704: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1206] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-12-14 00:00:50.389806: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212]      0 
2021-12-14 00:00:50.389850: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1225] 0:   N 
2021-12-14 00:00:50.390040: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1049] ARM64 does not support NUMA - returning NUMA node zero
2021-12-14 00:00:50.390272: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1049] ARM64 does not support NUMA - returning NUMA node zero
2021-12-14 00:00:50.390468: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1049] ARM64 does not support NUMA - returning NUMA node zero
2021-12-14 00:00:50.390680: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1351] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1962 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
2021-12-14 00:00:50.399212: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7ee007e960 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-12-14 00:00:50.399305: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): NVIDIA Tegra X2, Compute Capability 6.2
2021-12-14 00:00:50.768844: I tensorflow/cc/saved_model/loader.cc:251] Restoring SavedModel bundle.
2021-12-14 00:00:51.502181: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:486] There are 4 ops of 3 different types in the graph that are not converted to TensorRT: StatefulPartitionedCall, NoOp, Placeholder, (For more information see https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#supported-ops).
2021-12-14 00:00:51.502330: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:647] Number of TensorRT candidate segments: 0
2021-12-14 00:00:51.663961: W tensorflow/compiler/tf2tensorrt/convert/trt_optimization_pass.cc:183] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2021-12-14 00:00:52.347699: I tensorflow/cc/saved_model/loader.cc:200] Running initialization op on SavedModel bundle at path: /opt/nvidia/deepstream/deepstream-6.0/projects/project/myproject/models/ssd_mobilenet_v1_trt/5/model.savedmodel
2021-12-14 00:00:52.923749: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:486] There are 4 ops of 2 different types in the graph that are not converted to TensorRT: PartitionedCall, NoOp, (For more information see https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#supported-ops).
2021-12-14 00:00:52.923876: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:647] Number of TensorRT candidate segments: 0
2021-12-14 00:00:53.090327: W tensorflow/compiler/tf2tensorrt/convert/trt_optimization_pass.cc:183] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2021-12-14 00:00:53.097324: W tensorflow/compiler/tf2tensorrt/convert/trt_optimization_pass.cc:183] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2021-12-14 00:00:53.213619: I tensorflow/cc/saved_model/loader.cc:379] SavedModel load for tags { serve }; Status: success. Took 7506710 microseconds.
I1214 00:00:53.214262 26 dynamic_batch_scheduler.cc:230] Starting dynamic-batch scheduler thread 0 at nice 5...
I1214 00:00:53.214447 26 model_repository_manager.cc:1212] successfully loaded 'ssd_mobilenet_v1_trt' version 5
I1214 00:00:53.214495 26 model_repository_manager.cc:988] TriggerNextAction() 'ssd_mobilenet_v1_trt' version 5: 0
I1214 00:00:53.214523 26 model_repository_manager.cc:1003] no next action, trigger OnComplete()
I1214 00:00:53.214696 26 model_repository_manager.cc:594] VersionStates() 'ssd_mobilenet_v1_trt'
I1214 00:00:53.214744 26 model_repository_manager.cc:594] VersionStates() 'ssd_mobilenet_v1_trt'
I1214 00:00:53.214773 26 model_repository_manager.cc:638] GetInferenceBackend() 'ssd_mobilenet_v1_trt' version 5
INFO: TrtISBackend id:1 initialized model: ssd_mobilenet_v1_trt
I1214 00:00:54.246747 26 model_repository_manager.cc:638] GetInferenceBackend() 'ssd_mobilenet_v1_trt' version 5
I1214 00:00:54.246859 26 infer_request.cc:524] prepared: [0x0x1ac7fda0] request id: 0, model: ssd_mobilenet_v1_trt, requested version: 5, actual version: 5, flags: 0x0, correlation id: 0, batch size: 1, priority: 0, timeout (us): 0
original inputs:
[0x0x1c2c5928] input: image_tensor, type: UINT8, original shape: [1,300,300,3], batch + shape: [1,300,300,3], shape: [300,300,3]
override inputs:
inputs:
[0x0x1c2c5928] input: image_tensor, type: UINT8, original shape: [1,300,300,3], batch + shape: [1,300,300,3], shape: [300,300,3]
original requested outputs:
detection_boxes
detection_classes
detection_scores
num_detections
requested outputs:
detection_boxes
detection_classes
detection_scores
num_detections

I1214 00:00:54.247060 26 tensorflow.cc:2394] model ssd_mobilenet_v1_trt, instance ssd_mobilenet_v1_trt_0, executing 1 requests
I1214 00:00:54.247175 26 tensorflow.cc:1567] TRITONBACKEND_ModelExecute: Running ssd_mobilenet_v1_trt_0 with 1 requests
I1214 00:00:54.250084 26 tensorflow.cc:1820] TRITONBACKEND_ModelExecute: input 'image_tensor' is GPU tensor: false
2021-12-14 00:00:55.052147: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:486] There are 4 ops of 3 different types in the graph that are not converted to TensorRT: StatefulPartitionedCall, NoOp, Placeholder, (For more information see https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#supported-ops).
2021-12-14 00:00:55.052295: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:647] Number of TensorRT candidate segments: 0
2021-12-14 00:00:55.198460: W tensorflow/compiler/tf2tensorrt/convert/trt_optimization_pass.cc:183] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2021-12-14 00:00:55.203888: W tensorflow/compiler/tf2tensorrt/convert/trt_optimization_pass.cc:183] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2021-12-14 00:00:55.599404: W tensorflow/compiler/tf2tensorrt/convert/trt_optimization_pass.cc:183] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
./app1-jetson.sh: line 3:    26 Segmentation fault      (core dumped) deepstream-app -c app1_toplevel_jetson.txt

Hi,

Do you convert the TF-TRT model on the TX2?

Please note that the TensorRT engine is not portable.
You will need to convert it on the target platform directly.

Thanks.

Hi @AastaLLL

Yes, I converted the model on the TX2.

I forgot to mention that the TX2 has

  • Python 3.6.9
  • JetPack 4.6
  • TensorRT 8.0.1
  • Cuda 10.2
  • CuDNN 8.2
  • GStreamer 1.14.5
  • Triton 2.13

As much as I would like to figure out what’s going on with Triton Sever and the model, however, I can’t solve it.
Here is the gdb output.

Thread 36 "deepstream-app" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ea4ff8d70 (LWP 54)]
0x0000007f0ff42070 in tensorflow::{lambda(tensorflow::shape_inference::InferenceContext*)#9}::_FUN(tensorflow::shape_inference::InferenceContext*) ()
   from /opt/nvidia/deepstream/deepstream-6.0/lib/triton_backends/tensorflow1/libtensorflow_triton.so.1
(gdb) bt
#0  0x0000007f0ff42070 in tensorflow::{lambda(tensorflow::shape_inference::InferenceContext*)#9}::_FUN(tensorflow::shape_inference::InferenceContext*) ()
    at /opt/nvidia/deepstream/deepstream-6.0/lib/triton_backends/tensorflow1/libtensorflow_triton.so.1
#1  0x0000007f09ce2a7c in std::_Function_handler<tensorflow::Status (tensorflow::shape_inference::InferenceContext*), tensorflow::Status (*)(tensorflow::shape_inference::InferenceContext*)>::_M_invoke(std::_Any_data const&, tensorflow::shape_inference::InferenceContext*&&) () at /opt/nvidia/deepstream/deepstream-6.0/lib/triton_backends/tensorflow1/libtensorflow_triton.so.1
#2  0x0000007f102a40d4 in tensorflow::shape_inference::InferenceContext::Run(std::function<tensorflow::Status (tensorflow::shape_inference::InferenceContext*)> const&) ()
    at /opt/nvidia/deepstream/deepstream-6.0/lib/triton_backends/tensorflow1/libtensorflow_triton.so.1
#3  0x0000007f1010ff38 in tensorflow::grappler::SymbolicShapeRefiner::InferShapes(tensorflow::NodeDef const&, tensorflow::grappler::SymbolicShapeRefiner::NodeContext*) ()
    at /opt/nvidia/deepstream/deepstream-6.0/lib/triton_backends/tensorflow1/libtensorflow_triton.so.1
#4  0x0000007f10115c74 in tensorflow::grappler::SymbolicShapeRefiner::UpdateNode(tensorflow::NodeDef const*, bool*) ()
    at /opt/nvidia/deepstream/deepstream-6.0/lib/triton_backends/tensorflow1/libtensorflow_triton.so.1
#5  0x0000007f10116e68 in tensorflow::grappler::GraphProperties::UpdateShapes(tensorflow::grappler::SymbolicShapeRefiner*, std::unordered_map<tensorflow::NodeDef const*, tensorflow::NodeDef const*, std::hash<tensorflow::NodeDef const*>, std::equal_to<tensorflow::NodeDef const*>, std::allocator<std::pair<tensorflow::NodeDef const* const, tensorflow::NodeDef const*> > > const&, tensorflow::NodeDef const*, bool*) const () at /opt/nvidia/deepstream/deepstream-6.0/lib/triton_backends/tensorflow1/libtensorflow_triton.so.1
#6  0x0000007f10117008 in tensorflow::grappler::GraphProperties::PropagateShapes(tensorflow::grappler::SymbolicShapeRefiner*, tensorflow::grappler::TopoQueue*, std::unordered_map<tensorflow::NodeDef const*, tensorflow::NodeDef const*, std::hash<tensorflow::NodeDef const*>, std::equal_to<tensorflow::NodeDef const*>, std::allocator<std::pair<tensorflow::NodeDef const* const, tensorflow::NodeDef const*> > > const&, int) const () at /opt/nvidia/deepstream/deepstream-6.0/lib/triton_backends/tensorflow1/libtensorflow_triton.so.1
#7  0x0000007f1011177c in tensorflow::grappler::GraphProperties::InferStatically(bool, bool, bool, bool) ()
    at /opt/nvidia/deepstream/deepstream-6.0/lib/triton_backends/tensorflow1/libtensorflow_triton.so.1
#8  0x0000007f100a70a0 in tensorflow::grappler::ConstantFolding::RunOptimizationPass(tensorflow::grappler::Cluster*, tensorflow::grappler::GrapplerItem const&, tensorflow::GraphDef*) ()
    at /opt/nvidia/deepstream/deepstream-6.0/lib/triton_backends/tensorflow1/libtensorflow_triton.so.1
#9  0x0000007f100a7cc8 in tensorflow::grappler::ConstantFolding::Optimize(tensorflow::grappler::Cluster*, tensorflow::grappler::GrapplerItem const&, tensorflow::GraphDef*) ()
    at /opt/nvidia/deepstream/deepstream-6.0/lib/triton_backends/tensorflow1/libtensorflow_triton.so.1
#10 0x0000007f0ffc1640 in tensorflow::grappler::MetaOptimizer::RunOptimizer(tensorflow::grappler::GraphOptimizer*, tensorflow::grappler::Cluster*, tensorflow::grappler::GrapplerItem*, tensorflow::GraphDef*, tensorflow::grappler::MetaOptimizer::GraphOptimizationResult*) () at /opt/nvidia/deepstream/deepstream-6.0/lib/triton_backends/tensorflow1/libtensorflow_triton.so.1
#11 0x0000007f0ffc279c in tensorflow::grappler::MetaOptimizer::OptimizeGraph(tensorflow::grappler::Cluster*, tensorflow::grappler::GrapplerItem const&, tensorflow::GraphDef*) ()
    at /opt/nvidia/deepstream/deepstream-6.0/lib/triton_backends/tensorflow1/libtensorflow_triton.so.1
#12 0x0000007f0ffc4b9c in tensorflow::grappler::MetaOptimizer::Optimize(tensorflow::grappler::Cluster*, tensorflow::grappler::GrapplerItem const&, tensorflow::GraphDef*) ()
    at /opt/nvidia/deepstream/deepstream-6.0/lib/triton_backends/tensorflow1/libtensorflow_triton.so.1
#13 0x0000007f0ffc58dc in tensorflow::grappler::RunMetaOptimizer(tensorflow::grappler::GrapplerItem const&, tensorflow::ConfigProto const&, tensorflow::DeviceBase*, tensorflow::grappler::Cluster*, tensorflow::GraphDef*) () at /opt/nvidia/deepstream/deepstream-6.0/lib/triton_backends/tensorflow1/libtensorflow_triton.so.1
#14 0x0000007f0ffb7f34 in tensorflow::GraphExecutionState::OptimizeGraph(tensorflow::BuildGraphOptions const&, std::unique_ptr<tensorflow::Graph, std::default_delete<tensorflow::Graph> >*, std::unique_ptr<tensorflow::FunctionLibraryDefinition, std::default_delete<tensorflow::FunctionLibraryDefinition> >*) ()
    at /opt/nvidia/deepstream/deepstream-6.0/lib/triton_backends/tensorflow1/libtensorflow_triton.so.1
#15 0x0000007f0ffb967c in tensorflow::GraphExecutionState::BuildGraph(tensorflow::BuildGraphOptions const&, std::unique_ptr<tensorflow::ClientGraph, std::default_delete<tensorflow::ClientGraph> >*) ()
    at /opt/nvidia/deepstream/deepstream-6.0/lib/triton_backends/tensorflow1/libtensorflow_triton.so.1
#16 0x0000007f0febf6c8 in tensorflow::DirectSession::CreateGraphs(tensorflow::BuildGraphOptions const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::unique_ptr<tensorflow::Graph, std::default_delete<tensorflow::Graph> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::unique_ptr<tensorflow::Graph, std::default_delete<tensorflow::Graph> > > > >*, std::unique_ptr<tensorflow::FunctionLibraryDefinition, std::default_delete<tensorflow::FunctionLibraryDefinition> >*, tensorflow::DirectSession::RunStateArgs*, absl::InlinedVector<tensorflow::DataType, 4ul, std::allocator<tensorflow::DataType> >*, absl::InlinedVector<tensorflow::DataType, 4ul, std::allocator<tensorflow::DataType> >*, long long*) ()
    at /opt/nvidia/deepstream/deepstream-6.0/lib/triton_backends/tensorflow1/libtensorflow_triton.so.1
#17 0x0000007f0fec05e0 in tensorflow::DirectSession::CreateExecutors(tensorflow::CallableOptions const&, std::unique_ptr<tensorflow::DirectSession::ExecutorsAndKeys, std::default_delete<tensorflow::DirectSession::ExecutorsAndKeys> >*, std::unique_ptr<tensorflow::DirectSession::FunctionInfo, std::default_delete<tensorflow::DirectSession::FunctionInfo> >*, tensorflow::DirectSession::RunStateArgs*) ()
    at /opt/nvidia/deepstream/deepstream-6.0/lib/triton_backends/tensorflow1/libtensorflow_triton.so.1
#18 0x0000007f0fec2020 in tensorflow::DirectSession::GetOrCreateExecutors(absl::Span<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const>, absl::Span<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const>, absl::Span<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const>, tensorflow::DirectSession::ExecutorsAndKeys**, tensorflow::DirectSession::RunStateArgs*) () at /opt/nvidia/deepstream/deepstream-6.0/lib/triton_backends/tensorflow1/libtensorflow_triton.so.1
#19 0x0000007f0fec30d4 in tensorflow::DirectSession::Run(tensorflow::RunOptions const&, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tensorflow::Tensor>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tensorflow::Tensor> > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<tensorflow::Tensor, std::allocator<tensorflow::Tensor> >*, tensorflow::RunMetadata*) () at /opt/nvidia/deepstream/deepstream-6.0/lib/triton_backends/tensorflow1/libtensorflow_triton.so.1
#20 0x0000007f0fec2e98 in tensorflow::DirectSession::Run(std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tensorflow::Tensor>, std::allocator<std::pai---Type <return> to continue, or q <return> to quit---
r<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tensorflow::Tensor> > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<tensorflow::Tensor, std::allocator<tensorflow::Tensor> >*) ()
    at /opt/nvidia/deepstream/deepstream-6.0/lib/triton_backends/tensorflow1/libtensorflow_triton.so.1
#21 0x0000007f09d7d7c0 in (anonymous namespace)::ModelImpl::Run(tritontf_tensorlist_struct*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, tritontf_tensorlist_struct**) ()
    at /opt/nvidia/deepstream/deepstream-6.0/lib/triton_backends/tensorflow1/libtensorflow_triton.so.1
#22 0x0000007f09d7dc94 in TRITONTF_ModelRun () at /opt/nvidia/deepstream/deepstream-6.0/lib/triton_backends/tensorflow1/libtensorflow_triton.so.1
#23 0x0000007f446fcf18 in triton::backend::tensorflow::ModelInstanceState::ProcessRequests(TRITONBACKEND_Request**, unsigned int) ()
    at /opt/nvidia/deepstream/deepstream-6.0/lib/triton_backends/tensorflow1/libtriton_tensorflow1.so
#24 0x0000007f446ff594 in TRITONBACKEND_ModelInstanceExecute () at /opt/nvidia/deepstream/deepstream-6.0/lib/triton_backends/tensorflow1/libtriton_tensorflow1.so
#25 0x0000007f4de7370c in std::_Function_handler<void (unsigned int, std::vector<std::unique_ptr<nvidia::inferenceserver::InferenceRequest, std::default_delete<nvidia::inferenceserver::InferenceRequest> >, std::allocator<std::unique_ptr<nvidia::inferenceserver::InferenceRequest, std::default_delete<nvidia::inferenceserver::InferenceRequest> > > >&&), nvidia::inferenceserver::TritonModel::Create(nvidia::inferenceserver::InferenceServer*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long, inference::ModelConfig const&, std::unique_ptr<nvidia::inferenceserver::TritonModel, std::default_delete<nvidia::inferenceserver::TritonModel> >*)::{lambda(unsigned int, std::vector<std::unique_ptr<nvidia::inferenceserver::InferenceRequest, std::default_delete<nvidia::inferenceserver::InferenceRequest> >, std::allocator<std::unique_ptr<nvidia::inferenceserver::InferenceRequest, std::default_delete<nvidia::inferenceserver::InferenceRequest> > > >&&)#2}>::_M_invoke(std::_Any_data const&, unsigned int&&, std::vector<std::unique_ptr<nvidia::inferenceserver::InferenceRequest, std::default_delete<nvidia::inferenceserver::InferenceRequest> >, std::allocator<std::unique_ptr<nvidia::inferenceserver::InferenceRequest, std::default_delete<nvidia::inferenceserver::InferenceRequest> > > >&&) () at /opt/nvidia/deepstream/deepstream-6.0/lib/libtritonserver.so
#26 0x0000007f4dcd35c8 in nvidia::inferenceserver::DynamicBatchScheduler::SchedulerThread(unsigned int, int, std::shared_ptr<std::atomic<bool> > const&, std::promise<bool>*) ()
    at /opt/nvidia/deepstream/deepstream-6.0/lib/libtritonserver.so
#27 0x0000007f84675e94 in  () at /usr/lib/aarch64-linux-gnu/libstdc++.so.6
#28 0x0000007f90331088 in start_thread () at /lib/aarch64-linux-gnu/libpthread.so.0
#29 0x0000007f901d0ffc in  () at /lib/aarch64-linux-gnu/libc.so.6

Hi @AastaLLL

I have an update.

On a dGPU container, the model reconverted there too, I run the Triton Server with

tritonserver --model-store models --strict-model-config=true --log-verbose=3 --model-control-mode=explicit --load-model=ssd_model

and stays waiting for a request with the model [READY].

Then, with a client based on this example simple_http_infer_client.py I request an inference with the model.

It segfaults. This time without DeepStream acting as a client.

I1215 21:14:43.937407 218 http_server.cc:2715] HTTP request: 2 /v2/models/ssd_model/infer
I1215 21:14:43.937438 218 model_repository_manager.cc:638] GetInferenceBackend() 'ssd_model' version -1
I1215 21:14:43.937446 218 model_repository_manager.cc:638] GetInferenceBackend() 'ssd_model' version -1
I1215 21:14:44.091723 218 infer_request.cc:524] prepared: [0x0x7f0d980025c0] request id: , model: ssd_model, requested version: -1, actual version: 5, flags: 0x0, correlation id: 0, batch size: 0, priority: 0, timeout (us): 0
original inputs:
[0x0x7f0d98002b98] input: image_tensor, type: UINT8, original shape: [1,432,768,3], batch + shape: [1,432,768,3], shape: [1,432,768,3]
override inputs:
inputs:
[0x0x7f0d98002b98] input: image_tensor, type: UINT8, original shape: [1,432,768,3], batch + shape: [1,432,768,3], shape: [1,432,768,3]
original requested outputs:
detection_boxes
detection_classes
detection_scores
requested outputs:
detection_boxes
detection_classes
detection_scores

I1215 21:14:44.091792 218 tensorflow.cc:2389] model ssd_model, instance ssd_model_0, executing 1 requests
I1215 21:14:44.091804 218 tensorflow.cc:1563] TRITONBACKEND_ModelExecute: Running ssd_model_0 with 1 requests
I1215 21:14:44.092192 218 tensorflow.cc:1815] TRITONBACKEND_ModelExecute: input 'image_tensor' is GPU tensor: false
2021-12-15 21:14:44.342181: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:486] There are 4 ops of 3 different types in the graph that are not converted to TensorRT: StatefulPartitionedCall, Placeholder, NoOp, (For more information see https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#supported-ops).
2021-12-15 21:14:44.342232: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:647] Number of TensorRT candidate segments: 0
2021-12-15 21:14:44.371577: W tensorflow/compiler/tf2tensorrt/convert/trt_optimization_pass.cc:183] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2021-12-15 21:14:44.372440: W tensorflow/compiler/tf2tensorrt/convert/trt_optimization_pass.cc:183] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2021-12-15 21:14:44.451053: W tensorflow/compiler/tf2tensorrt/convert/trt_optimization_pass.cc:183] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2021-12-15 21:14:44.512876: W tensorflow/compiler/tf2tensorrt/convert/trt_optimization_pass.cc:183] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
Segmentation fault (core dumped)

So, this is not a Jetson device or DeepStream problem. It is a Triton Server, TensorFlow, TRT problem.

Thanks!!!

Hi,

Thanks for the testing and feedback.
Would you mind trying following experiments as well.

First, it looks like you have enabled the TF-TRT config.
Please check if the model can work with the Triton server on the pure TensorFlow mode. (turn-off TRT optimization)

Next, we also have a prebuilt TensorFlow package for Jetson.
Could you try if the model can work with the TensorFlow library directly.
https://docs.nvidia.com/deeplearning/frameworks/install-tf-jetson-platform/index.html

Thanks.

Hi @AastaLLL
I tested the suggested experiments:

  1. I disabled the TF-TRT optimization by commenting on this section of the triton model config.
    Is this the correct way to disable the TRT mode? I followed this page Triton optimization.
    The result is still segfault.
#optimization { execution_accelerators {
#  gpu_execution_accelerator {
#    name : "tensorrt"
#    parameters { key: "precision_mode" value: "FP16" }
#} } }
  1. Regarding TensorFlow package: I have found previously that this behavior (segfault) is common across dGPU and Jetson architectures. And is important to note that this model worked just fine before the TF-TRT optimization, both on dGPU and the Jetson combined with Triton. I think this issue is arch-independent, what do you think?
    In addition, after the export script, I run a dummy inference test in the TensorFlow container, and I don’t have errors, I can see the output nodes.

Thanks!

Yes. It sounds like some issue with the optimization.

We are going to reproduce this internally.
Will share more information with you later.

Hi,

Would you mind also sharing the script to load the ssd_mobilenet_v1 and convert it into the TF-TRT model with us?

Thanks.

Hi @AastaLLL
I sent you the information in a private message.
For people reading this, the references and resources to reproduce the issue are:
tf1_detection_zoo.md
TensorRT/tree/main/quickstart
nvidia/containers/tensorflow
https://github.com/triton-inference-server/client
Thanks!

Hi,

We test the ssd_mobilenet_v1_coco_2018_01_28 model with deepstream-l4t:6.0-triton container.
The pipeline can run correctly without segmentation fault on TX2.

Is there anything missing in our testing.
Below is our procedure for your reference.

$ export DISPLAY=:0
$ xhost +
$ sudo docker run -it --rm --runtime nvidia --network host -v nvcr.io/nvidia/deepstream-l4t:6.0-triton
$ mkdir -p /opt/nvidia/deepstream/deepstream-6.0/samples/triton_model_repo/ssd_mobilenet_v1_coco_2018_01_28/1
$ wget -O /tmp/ssd_mobilenet_v1_coco_2018_01_28.tar.gz http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_coco_2018_01_28.tar.gz
$ cd /tmp && tar xzf ssd_mobilenet_v1_coco_2018_01_28.tar.gz
$ mv /tmp/ssd_mobilenet_v1_coco_2018_01_28/frozen_inference_graph.pb /opt/nvidia/deepstream/deepstream-6.0/samples/triton_model_repo/ssd_mobilenet_v1_coco_2018_01_28/1/
$ rm -fr /tmp/ssd_mobilenet_v1_coco_2018_01_28.tar.gz  /tmp/ssd_mobilenet_v1_coco_2018_01_28
$ cd -

Update config: source1_primary_detector.txt (3.7 KB) config.pbtxt (2.6 KB)

$ export DISPLAY=:0
$ cd samples/configs/deepstream-app-triton/
$ deepstream-app -c source1_primary_detector.txt

Thanks.

1 Like

Hi @AastaLLL

Very interesting…

In your procedure, you are not using a script in a specific step to convert the pure TF model to TF-TRT model.
I didn’t know that is possible to use a pure TF model and request Triton for TRT optimization online by adding this code as we saw earlier:

optimization { execution_accelerators {
  gpu_execution_accelerator {
    name : "tensorrt"
...
...

Maybe I was wrong in trying to optimize the model with a script.

Let me check if with your procedure works, and I will get back to you.

Thanks!!!

Thanks @AastaLLL , it works.

Q: Are the weights generated in this optimization procedure saved? Or this has to ‘warm-up’ every time?

Thanks

Hi,

This follows the Triton mechanism.
You can use model warmup to avoid model startup/optimization slowdown:

Thanks.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.