The still issue occurs after previous uninstalling torch, and reinstallation 2.0.1, and validated 2.0.1 version.
To make sure, I reproduced using the steps below, but this time attached the successful run log output, hoping this could help identify the issue.
- Dockerfile:
RUN ./ --build-bindings
- Checked
pip list | grep torch
- no torch is installed.
- Installing torch:
pip install torch==2.0.1
- Installed model repo using:
- (from
- Running ssd parser test app successfully:
python3 /opt/nvidia/deepstream/deepstream/samples/streams/sample_720p.h264
- (from
/opt/nvidia/deepstream/deepstream-6.4/sources/deepstream_python_apps/apps/deepstream-ssd-parser# python3 /opt/nvidia/deepstream/deepstream/samples/streams/sample_720p.h264
(gst-plugin-scanner:313): GStreamer-WARNING **: 09:08:34.273: Failed to load plugin '/usr/lib/x86_64-linux-gnu/gstreamer-1.0/deepstream/': cannot open shared object file: No such file or directory
WARNING: infer_proto_utils.cpp:201 backend.trt_is is deprecated. updated it to backend.triton
I0214 09:08:34.783982 309] TRITONBACKEND_Initialize: pytorch
I0214 09:08:34.784002 309] Triton TRITONBACKEND API version: 1.15
I0214 09:08:34.784005 309] 'pytorch' TRITONBACKEND API version: 1.15
I0214 09:08:34.862168 309] Pinned memory pool is created at '0x7f9a10000000' with size 268435456
I0214 09:08:34.862430 309] CUDA memory pool is created on device 0 with size 67108864
I0214 09:08:34.875914 309] loading: ssd_inception_v2_coco_2018_01_28:1
I0214 09:08:35.046489 309] TRITONBACKEND_Initialize: tensorflow
I0214 09:08:35.046513 309] Triton TRITONBACKEND API version: 1.15
I0214 09:08:35.046518 309] 'tensorflow' TRITONBACKEND API version: 1.15
I0214 09:08:35.046522 309] backend configuration:
I0214 09:08:35.046862 309] TRITONBACKEND_ModelInitialize: ssd_inception_v2_coco_2018_01_28 (version 1)
I0214 09:08:35.048143 309] TRITONBACKEND_ModelInstanceInitialize: ssd_inception_v2_coco_2018_01_28_0_0 (GPU device 0)
2024-02-14 09:08:35.053251: I tensorflow/core/platform/] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-02-14 09:08:35.054007: I tensorflow/compiler/xla/stream_executor/cuda/] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at
2024-02-14 09:08:35.055821: I tensorflow/compiler/xla/stream_executor/cuda/] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at
2024-02-14 09:08:35.055956: I tensorflow/compiler/xla/stream_executor/cuda/] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at
2024-02-14 09:08:35.056182: I tensorflow/compiler/xla/stream_executor/cuda/] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at
2024-02-14 09:08:35.056318: I tensorflow/compiler/xla/stream_executor/cuda/] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at
2024-02-14 09:08:35.056440: I tensorflow/compiler/xla/stream_executor/cuda/] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at
2024-02-14 09:08:35.056559: I tensorflow/core/common_runtime/gpu/] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 4403 MB memory: -> device: 0, name: NVIDIA GeForce RTX 2080 Ti, pci bus id: 0000:07:00.0, compute capability: 7.5
2024-02-14 09:08:35.156000: I tensorflow/compiler/mlir/] MLIR V1 optimization pass is not enabled
I0214 09:08:35.198216 309] successfully loaded 'ssd_inception_v2_coco_2018_01_28'
INFO: infer_trtis_backend.cpp:218 TrtISBackend id:5 initialized model: ssd_inception_v2_coco_2018_01_28
2024-02-14 09:08:37.422963: I tensorflow/compiler/xla/stream_executor/cuda/] failed to allocate 4.30GiB (4617351936 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2024-02-14 09:08:37.578745: I tensorflow/compiler/xla/stream_executor/cuda/] Loaded cuDNN version 8904
Frame Number=0 Number of Objects=5 Vehicle_count=2 Person_count=2
Frame Number=1 Number of Objects=5 Vehicle_count=2 Person_count=2
Frame Number=2 Number of Objects=5 Vehicle_count=2 Person_count=2
Frame Number=3 Number of Objects=5 Vehicle_count=2 Person_count=2
. . .
Frame Number=1438 Number of Objects=4 Vehicle_count=4 Person_count=0
Frame Number=1439 Number of Objects=5 Vehicle_count=4 Person_count=1
Frame Number=1440 Number of Objects=6 Vehicle_count=5 Person_count=1
Frame Number=1441 Number of Objects=0 Vehicle_count=0 Person_count=0
I0214 09:09:05.946420 309] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0214 09:09:05.946431 309] Waiting for in-flight requests to complete.
I0214 09:09:05.946451 309] Timeout 30: Found 0 model versions that have in-flight inferences
I0214 09:09:05.946456 309] All models are stopped, unloading models
I0214 09:09:05.946463 309] Timeout 30: Found 1 live models and 0 in-flight non-inference requests
I0214 09:09:05.946499 309] TRITONBACKEND_ModelFinalize: delete model state
I0214 09:09:05.963695 309] successfully unloaded 'ssd_inception_v2_coco_2018_01_28' version 1
I0214 09:09:06.946560 309] Timeout 29: Found 0 live models and 0 in-flight non-inference requests
- Edit .py file
: add import torch
as first or last import.
- Rerun - fails immediately.
/opt/nvidia/deepstream/deepstream-6.4/sources/deepstream_python_apps/apps/deepstream-ssd-parser# python3 /opt/nvidia/deepstream/deepstream/samples/streams/sample_720p.h264
WARNING: infer_proto_utils.cpp:201 backend.trt_is is deprecated. updated it to backend.triton
I0214 09:15:00.493329 442] Collecting metrics for GPU 0: NVIDIA GeForce RTX 2080 Ti
I0214 09:15:00.493481 442] Collecting CPU metrics
I0214 09:15:00.493587 442] No server context available. Exiting immediately.
ERROR: infer_trtis_server.cpp:994 Triton: failed to create repo server, triton_err_str:Not found, err_msg:unable to load shared library: /opt/tritonserver/backends/pytorch/ undefined symbol: _ZN3c106detail14torchCheckFailEPKcS2_jRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
ERROR: infer_trtis_server.cpp:840 failed to initialize trtserver on repo dir: root: "/opt/nvidia/deepstream/deepstream-6.4/samples/triton_model_repo"
log_level: 2
tf_gpu_memory_fraction: 0.4
0:00:00.121740825 442 0x556af32e8500 ERROR nvinferserver gstnvinferserver.cpp:408:gst_nvinfer_server_logger:<primary-inference> nvinferserver[UID 5]: Error in createNNBackend() <infer_trtis_context.cpp:256> [UID = 5]: model:ssd_inception_v2_coco_2018_01_28 get triton server instance failed. repo:root: "/opt/nvidia/deepstream/deepstream-6.4/samples/triton_model_repo"
log_level: 2
tf_gpu_memory_fraction: 0.4
0:00:00.121762726 442 0x556af32e8500 ERROR nvinferserver gstnvinferserver.cpp:408:gst_nvinfer_server_logger:<primary-inference> nvinferserver[UID 5]: Error in initialize() <infer_base_context.cpp:79> [UID = 5]: create nn-backend failed, check config file settings, nvinfer error:NVDSINFER_TRITON_ERROR
0:00:00.121772445 442 0x556af32e8500 WARN nvinferserver gstnvinferserver_impl.cpp:592:start:<primary-inference> error: Failed to initialize InferTrtIsContext
0:00:00.121778817 442 0x556af32e8500 WARN nvinferserver gstnvinferserver_impl.cpp:592:start:<primary-inference> error: Config file path: dstest_ssd_nopostprocess.txt
0:00:00.122079852 442 0x556af32e8500 WARN nvinferserver gstnvinferserver.cpp:518:gst_nvinfer_server_start:<primary-inference> error: gstnvinferserver_impl start failed
Error: gst-resource-error-quark: Failed to initialize InferTrtIsContext (1): gstnvinferserver_impl.cpp(592): start (): /GstPipeline:pipeline0/GstNvInferServer:primary-inference:
Config file path: dstest_ssd_nopostprocess.txt