DeepStream Python SSD : Not utilising GPU and it is slow

epratheeban · May 19, 2021, 7:07am

Device : Jetson Xavier Nx
Jetpack : JetPack 4.5.1 [L4t 32.5.1]

I tried to run the sample python apps from here
[https://github.com/NVIDIA-AI-IOT/deepstream_python_apps/tree/master/apps/deepstream-ssd-parser](https://DeepStream Python SSD apps)

I followed instructions as stated in the above repository.

The GPU instance was set as
instance_group {
kind: KIND_GPU
count: 1
gpus: 0
}

Here is the log when the model is loaded

thukhi@thukhi:/opt/nvidia/deepstream/deepstream-5.1/sources/deepstream_python_apps/apps/deepstream-ssd-parser$ sudo python3 deepstream_ssd_parser.py 
../../../../samples/streams/sample_720p.h264 
Creating Pipeline 
 
Creating Source
Creating H264Parser
Creating Decoder
Creating NvStreamMux
Creating Nvinferserver
Creating Nvvidconv
Creating OSD (nvosd)
Creating Queue
Creating Converter 2 (nvvidconv2)
Creating capsfilter
Creating Encoder
Creating Code Parser
Creating Container
Creating Sink
Playing file ../../../../samples/streams/sample_720p.h264 
Adding elements to Pipeline 

Linking elements in the Pipeline 

Starting pipeline 

Opening in BLOCKING MODE
Opening in BLOCKING MODE 
I0519 06:46:02.314171 15820 pinned_memory_manager.cc:199] Pinned memory pool is created at '0x2030ba000' with size 67108864
I0519 06:46:02.314522 15820 cuda_memory_manager.cc:99] CUDA memory pool is created on device 0 with size 67108864
I0519 06:46:02.317155 15820 server.cc:141] 
+---------+--------+------+
| Backend | Config | Path |
+---------+--------+------+
+---------+--------+------+

I0519 06:46:02.317284 15820 server.cc:184] 
+-------+---------+--------+
| Model | Version | Status |
+-------+---------+--------+
+-------+---------+--------+

I0519 06:46:02.317763 15820 tritonserver.cc:1620] 
+----------------------------------+----------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                          |
+----------------------------------+----------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                         |
| server_version                   | 2.5.0                                                                                                          |
| server_extensions                | classification sequence model_repository schedule_policy model_configuration system_shared_memory cuda_shared_ |
|                                  | memory binary_tensor_data statistics                                                                           |
| model_repository_path[0]         | /opt/nvidia/deepstream/deepstream-5.1/samples/trtis_model_repo                                                 |
| model_control_mode               | MODE_EXPLICIT                                                                                                  |
| strict_model_config              | 0                                                                                                              |
| pinned_memory_pool_byte_size     | 67108864                                                                                                       |
| cuda_memory_pool_byte_size{0}    | 67108864                                                                                                       |
| min_supported_compute_capability | 5.3                                                                                                            |
| strict_readiness                 | 1                                                                                                              |
| exit_timeout                     | 30                                                                                                             |
+----------------------------------+----------------------------------------------------------------------------------------------------------------+

I0519 06:46:02.321476 15820 model_repository_manager.cc:810] loading: ssd_inception_v2_coco_2018_01_28:1
2021-05-19 08:46:03.025265: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
I0519 06:46:03.543636 15820 tensorflow.cc:1876] TRITONBACKEND_Initialize: tensorflow
I0519 06:46:03.543728 15820 tensorflow.cc:1889] Triton TRITONBACKEND API version: 1.0
I0519 06:46:03.543797 15820 tensorflow.cc:1895] 'tensorflow' TRITONBACKEND API version: 1.0
I0519 06:46:03.543833 15820 tensorflow.cc:1916] backend configuration:
{"cmdline":{"allow-soft-placement":"true","gpu-memory-fraction":"0.400000"}}
I0519 06:46:03.544064 15820 tensorflow.cc:1978] TRITONBACKEND_ModelInitialize: ssd_inception_v2_coco_2018_01_28 (version 1)
I0519 06:46:03.549827 15820 tensorflow.cc:2028] TRITONBACKEND_ModelInstanceInitialize: ssd_inception_v2_coco_2018_01_28_0 (GPU device 0)
2021-05-19 08:46:13.322728: W tensorflow/core/platform/profile_utils/cpu_utils.cc:98] Failed to find bogomips in /proc/cpuinfo; cannot determine CPU frequency
2021-05-19 08:46:13.324289: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f400508b0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-05-19 08:46:13.324404: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-05-19 08:46:13.324787: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-05-19 08:46:13.325027: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1049] ARM64 does not support NUMA - returning NUMA node zero
2021-05-19 08:46:13.325221: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1665] Found device 0 with properties: 
name: Xavier major: 7 minor: 2 memoryClockRate(GHz): 1.109
pciBusID: 0000:00:00.0
2021-05-19 08:46:13.325322: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
2021-05-19 08:46:13.325506: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2021-05-19 08:46:13.347415: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-05-19 08:46:13.386696: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-05-19 08:46:13.406021: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-05-19 08:46:13.432913: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10
2021-05-19 08:46:13.433305: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-05-19 08:46:13.433513: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1049] ARM64 does not support NUMA - returning NUMA node zero
2021-05-19 08:46:13.433738: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1049] ARM64 does not support NUMA - returning NUMA node zero
2021-05-19 08:46:13.433826: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1793] Adding visible gpu devices: 0
2021-05-19 08:46:13.434233: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1206] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-05-19 08:46:13.434330: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212]      0 
2021-05-19 08:46:13.434400: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1225] 0:   N 
2021-05-19 08:46:13.434682: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1049] ARM64 does not support NUMA - returning NUMA node zero
2021-05-19 08:46:13.434913: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1049] ARM64 does not support NUMA - returning NUMA node zero
2021-05-19 08:46:13.435118: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1049] ARM64 does not support NUMA - returning NUMA node zero
2021-05-19 08:46:13.435317: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1351] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3106 MB memory) -> physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2)
2021-05-19 08:46:13.441723: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f40057b60 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-05-19 08:46:13.441828: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Xavier, Compute Capability 7.2
I0519 06:46:15.267015 15820 model_repository_manager.cc:983] successfully loaded 'ssd_inception_v2_coco_2018_01_28' version 1
INFO: TrtISBackend id:5 initialized model: ssd_inception_v2_coco_2018_01_28
2021-05-19 08:46:28.736767: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-05-19 08:47:07.804698: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
NvMMLiteOpen : Block : BlockType = 261 
NVMEDIA: Reading vendor.tegra.display-size : status: 6 
NvMMLiteBlockCreate : Block : BlockType = 261 
Frame Number=0 Number of Objects=5 Vehicle_count=2 Person_count=2
Frame Number=1 Number of Objects=5 Vehicle_count=2 Person_count=2
Frame Number=2 Number of Objects=5 Vehicle_count=2 Person_count=2
Frame Number=3 Number of Objects=5 Vehicle_count=2 Person_count=2
Frame Number=4 Number of Objects=5 Vehicle_count=2 Person_count=2
Frame Number=5 Number of Objects=5 Vehicle_count=2 Person_count=2
Frame Number=6 Number of Objects=5 Vehicle_count=2 Person_count=2

In the log it shows it the GPU is utilized, but it is not utilized when running the code.

Could you please help ?

Thanks in Advance

AastaLLL · May 19, 2021, 8:25am

Thanks for your question.

We are reproducing this issue internally.
Will update more information with you later.

epratheeban · May 19, 2021, 8:16pm

Thanks awaiting for your reply !

AastaLLL · May 20, 2021, 5:29am

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

Hi,

This is the tegrastats from our testing:

...
... EMC_FREQ 23%@1600 GR3D_FREQ 4%@1109 
... EMC_FREQ 23%@1600 GR3D_FREQ 51%@1109 
... EMC_FREQ 23%@1600 GR3D_FREQ 90%@1109 
... EMC_FREQ 23%@1600 GR3D_FREQ 82%@1109 
... EMC_FREQ 23%@1600 GR3D_FREQ 52%@1109 
... EMC_FREQ 22%@1600 GR3D_FREQ 11%@1109 
... EMC_FREQ 22%@1600 GR3D_FREQ 1%@1109 
... EMC_FREQ 23%@1600 GR3D_FREQ 2%@1109 
...

The sample do use GPU but not always occupy all the resources
This reason might comes from data bandwidth or TensorFlow implementation.

Is this consistent to your observation?

Thanks.

Topic		Replies	Views
DeepStream SSD parser example stucks DeepStream SDK inference-server-triton	2	586	May 3, 2022
Jetson Xavier NX - Tensorflow 2 container slower on GPU than on CPU Jetson Xavier NX tensorflow	5	2545	October 18, 2021
Basic task to get avi file processed with sink to file fails on jetson DeepStream SDK	5	501	October 12, 2021
Unable to run python sample app deepstream_ssd_parser.py DeepStream SDK	9	1014	October 12, 2021
tensorflow.python.framework.errors_impl.InternalError: GPU sync failed Jetson TX2	8	6281	October 18, 2021
Implementing DeepStream/ TRT integration by Intels scenario DeepStream SDK	26	1763	September 24, 2020
Slow model loading on a Jetson AGX Xavier with TensorFlow 2.5.0 Jetson AGX Xavier cuda , tensorflow	13	2345	November 10, 2021
Deepstream with triton is stuck and not outputting anything DeepStream SDK inference-server-triton , inception	5	1021	September 19, 2022
GPU support for tflite Jetson Nano cuda , tensorflow	8	5273	October 18, 2021
Slowly inference on Xavier NX and OOM fault with TensorFlow 2 Jetson Xavier NX jetson-inference	3	1360	October 18, 2021

DeepStream Python SSD : Not utilising GPU and it is slow

Related topics