"fatal error: cuda_runtime_api.h: No such file or directory" when compiling with Jetpack 4.5.1

Mactemium · March 17, 2021, 2:04pm

Hello, I am trying to reproduce the steps of doing inference with the gestureNet pre-trained network. (GitHub - NVIDIA-AI-IOT/gesture_recognition_tlt_deepstream: A project demonstrating how to train your own gesture recognition deep learning pipeline. We start with a pre-trained detection model, repurpose it for hand detection using Transfer Learning Toolkit 3.0, and use it together with the purpose-built gesture recognition model. Once trained, we deploy this model on NVIDIA® Jetson™ using Deepstream SDK.)
I dowloaded the model and the tlt-converter and I have 2 problems :

First, to convert the model for TensorRT using the lt-converter I either get a “no input dimension given” error or a segmentation fault when I try differents approach (I tried adding the -d parameters for input dimension)
Then when I get to the “Building the application” step I can’t manage to build the executable by compiling the deepstream-app-bbox, I always have a “fatal error: cuda_runtime_api.h: No such file or directory”. I’ve tried to change the $PATH and LD_LIBRARY_PATH but I don’t know if this is a good solution. I also tried to add environnement variables such as export CUDA_HOME=/usr/local/cuda-10.2/ but nothing worked.
Here you can find my PATH :
echo $PATH
/home/nvidia/cmake-3.13.0/bin/:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/local/cuda/bin

and also my LD_LIBRARY_PATH
echo $LD_LIBRARY_PATH
/home/nvidia/cmake-3.13.0/bin/:/usr/local/cuda-10.2/

I am running a Jetson nano 4gb with a sd card flashed with JetPack4.5.1. The installed cuda version is 10.2 and I also installed the deepstream sdk 5.1

Thank you for your effort.

AastaLLL · March 18, 2021, 3:24am

Hi,

Sorry that the Makefile doesn’t link to CUDA correctly.
Please apply the following change:

diff --git a/Makefile b/Makefile
index 17e96f6..9d8e6ce 100644
--- a/Makefile
+++ b/Makefile
@@ -24,7 +24,7 @@ APP:= deepstream-app-bbox

 TARGET_DEVICE = $(shell gcc -dumpmachine | cut -f1 -d -)

-NVDS_VERSION:=5.0
+NVDS_VERSION:=5.1

 LIB_INSTALL_DIR?=/opt/nvidia/deepstream/deepstream-$(NVDS_VERSION)/lib/
 APP_INSTALL_DIR?=/opt/nvidia/deepstream/deepstream-$(NVDS_VERSION)/bin/
@@ -42,10 +42,12 @@ PKGS:= gstreamer-1.0 gstreamer-video-1.0 x11 json-glib-1.0

 OBJS:= $(SRCS:.c=.o)

-CFLAGS+= -I./ -I../../apps-common/includes -I../../../includes -DDS_VERSION_MINOR=0 -DDS_VERSION_MAJOR=5
+CFLAGS+= -I./ -I../../apps-common/includes -I../../../includes -DDS_VERSION_MINOR=1 -DDS_VERSION_MAJOR=5
+CFLAGS+= -I/usr/local/cuda-10.2/include

 LIBS+= -L$(LIB_INSTALL_DIR) -lnvdsgst_meta -lnvds_meta -lnvdsgst_helper -lnvdsgst_smartrecord -lnvds_utils -lnvds_msgbroker -lm \
        -lgstrtspserver-1.0 -ldl -Wl,-rpath,$(LIB_INSTALL_DIR)
+LIBS+= -L/usr/local/cuda-10.2/lib64/ -lcudart -lcuda

 CFLAGS+= `pkg-config --cflags $(PKGS)`

Or here is a modified Makefile, and you can replace the original one directly.
Makefile (2.3 KB)

Thanks.

Mactemium · March 18, 2021, 10:02am

Thank you, modifying the Makefile solved the compiling issues. However I don’t have a computer capable of doing the training phase (part 1 of the gitHub) and I miss 2 files to do the deployment part :
At that point you are ready to start the deployment part. You will need the following outputs from this experiment to proceed:\n",
“\n”,
“* experiment_dir_final/calibration_qat.bin\n”,
“* experiment_dir_final/resnet34_detector_qat.etlt\n”,
“\n”,
“Copy these files over to your Jetson and consult the README.md document for further instructions.”

Is it possible to reproduce the training phase directly on the jetson nano (using cloud GPU maybe or is there somewhere I can find those two files ?

Thank you

AastaLLL · March 23, 2021, 8:55am

Hi,

Calibration cache is used when running TensorRT with INT8 mode.
However, Nano doesn’t support INT8 operation (hardware limitation) so you don’t really need it.

For TLT model, you can get it with the instructions below:

Thanks.

Mactemium · March 24, 2021, 9:35am

Hello, thanks again for taking the time to respond.

I solved the tlt-converter issue and I should be able to run the app by now without the two calibration cache files but I still get an error when launching the ./deepstream-app-bbox -c source1_primary_detector_qat.txt :

Opening in BLOCKING MODE

Opening in BLOCKING MODE

Using winsys: x11
INFO: TrtISBackend id:4 initialized model: hcgesture_tlt
gstnvtracker: Loading low-level lib at /opt/nvidia/deepstream/deepstream-5.1/lib/libnvds_mot_klt.so
gstnvtracker: Optional NvMOT_RemoveStreams not implemented
gstnvtracker: Batch processing is OFF
gstnvtracker: Past frame output is OFF
ERROR: Deserialize engine failed because file path: /home/actemium/Documents/gesture_recognition_tlt_deepstream/deployment_deepstream/egohands-deepstream-app-trtis/tlt_models/tlt_egohands_qat/resnet34_detector_qat.etlt_b16_gpu0_int8.engine open error
0:00:11.627568224 8040 0x558f3b18d0 WARN nvinfer gstnvinfer.cpp:616:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1691> [UID = 1]: deserialize engine from file :/home/actemium/Documents/gesture_recognition_tlt_deepstream/deployment_deepstream/egohands-deepstream-app-trtis/tlt_models/tlt_egohands_qat/resnet34_detector_qat.etlt_b16_gpu0_int8.engine failed
0:00:11.627679276 8040 0x558f3b18d0 WARN nvinfer gstnvinfer.cpp:616:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:1798> [UID = 1]: deserialize backend context from engine from file :/home/actemium/Documents/gesture_recognition_tlt_deepstream/deployment_deepstream/egohands-deepstream-app-trtis/tlt_models/tlt_egohands_qat/resnet34_detector_qat.etlt_b16_gpu0_int8.engine failed, try rebuild
0:00:11.627714800 8040 0x558f3b18d0 INFO nvinfer gstnvinfer.cpp:619:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1716> [UID = 1]: Trying to create engine from model files
WARNING: INT8 not supported by platform. Trying FP16 mode.
parseModel: Failed to open TLT encoded model file /home/actemium/Documents/gesture_recognition_tlt_deepstream/deployment_deepstream/egohands-deepstream-app-trtis/tlt_models/tlt_egohands_qat/resnet34_detector_qat.etlt
ERROR: failed to build network since parsing model errors.
ERROR: Failed to create network using custom network creation function
ERROR: Failed to get cuda engine from custom library API
0:00:11.634547936 8040 0x558f3b18d0 ERROR nvinfer gstnvinfer.cpp:613:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1736> [UID = 1]: build engine file failed
double free or corruption (top)
^C
Aborted

I see that the two missing file are causing errors but it might be something else that I missed… If you have any ideas of what could cause the problem, thank you for sharing.

Have a good day

AastaLLL · March 30, 2021, 10:47am

Hi,

Could you try to get the model from the instructions below:

And copy the model to ${gesture_recognition_tlt_deepstream}/deployment_deepstream/egohands-deepstream-app-trtis/tlt_models/tlt_egohands_qat/?

Thanks.

Mactemium · March 30, 2021, 12:09pm

Hello,

I downloaded the model, placed it at the location and renamed it to ‘resnet34_detector_qat.etlt’ but I still get an error :

ERROR: Deserialize engine failed because file path: /home/actemium/Documents/gesture_recognition_tlt_deepstream/deployment_deepstream/egohands-deepstream-app-trtis/tlt_models/tlt_egohands_qat/resnet34_detector_qat.etlt_b16_gpu0_int8.engine open error
WARN nvinfer gstnvinfer.cpp:616:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:1798> [UID = 1]: deserialize backend context from engine from file :/home/actemium/Documents/gesture_recognition_tlt_deepstream/deployment_deepstream/egohands-deepstream-app-trtis/tlt_models/tlt_egohands_qat/resnet34_detector_qat.etlt_b16_gpu0_int8.engine failed, try rebuild
ERROR: [TRT]: UffParser: Could not read buffer.
parseModel: Failed to parse UFF model
ERROR: failed to build network since parsing model errors.
ERROR: Failed to create network using custom network creation function
ERROR: Failed to get cuda engine from custom library API

Thank you for keeping up with me.

Mactemium · April 7, 2021, 7:47am

Hello, just to give you an update, I still can’t manage to get the deepstream running… Here is a complete log of the output in case you might find the error :

Opening in BLOCKING MODE
Opening in BLOCKING MODE

Using winsys: x11
INFO: TrtISBackend id:4 initialized model: hcgesture_tlt
gstnvtracker: Loading low-level lib at /opt/nvidia/deepstream/deepstream-5.1/lib/libnvds_mot_klt.so
gstnvtracker: Optional NvMOT_RemoveStreams not implemented
gstnvtracker: Batch processing is OFF
gstnvtracker: Past frame output is OFF
ERROR: Deserialize engine failed because file path: /home/actemium/Documents/gesture_recognition_tlt_deepstream/deployment_deepstream/egohands-deepstream-app-trtis/tlt_models/tlt_egohands_qat/resnet34_detector_qat.etlt_b16_gpu0_int8.engine open error
0:00:12.277603305 14402 0x558fad3a60 WARN nvinfer gstnvinfer.cpp:616:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1691> [UID = 1]: deserialize engine from file :/home/actemium/Documents/gesture_recognition_tlt_deepstream/deployment_deepstream/egohands-deepstream-app-trtis/tlt_models/tlt_egohands_qat/resnet34_detector_qat.etlt_b16_gpu0_int8.engine failed
0:00:12.277677001 14402 0x558fad3a60 WARN nvinfer gstnvinfer.cpp:616:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:1798> [UID = 1]: deserialize backend context from engine from file :/home/actemium/Documents/gesture_recognition_tlt_deepstream/deployment_deepstream/egohands-deepstream-app-trtis/tlt_models/tlt_egohands_qat/resnet34_detector_qat.etlt_b16_gpu0_int8.engine failed, try rebuild
0:00:12.277709448 14402 0x558fad3a60 INFO nvinfer gstnvinfer.cpp:619:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1716> [UID = 1]: Trying to create engine from model files
WARNING: INT8 not supported by platform. Trying FP16 mode.
ERROR: [TRT]: UffParser: Could not read buffer.
parseModel: Failed to parse UFF model
ERROR: failed to build network since parsing model errors.
ERROR: Failed to create network using custom network creation function
ERROR: Failed to get cuda engine from custom library API
0:00:12.936898156 14402 0x558fad3a60 ERROR nvinfer gstnvinfer.cpp:613:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1736> [UID = 1]: build engine file failed
Bus error

Have a good day

AastaLLL · April 13, 2021, 11:51am

Hi,

The download file is in TLT format.
But somehow Deepstream try to parse it with an uff parser.

Let us check this internally and update more information with you later.
Thanks.

AastaLLL · April 15, 2021, 9:05am

Hi,

In this example, two model are required: hand detector + gesture recognition.
It seems only gesture recognition model is released.

You will need to apply the TLT training to get a hand detection model by transfer learning.
Thanks.