Inference problem with FPEnet

I have converted the FPEnet file with tao. But now when I want to inference it, it gives following error…

(env) eren@erennx:~/FPEnet$ /home/eren/env/bin/python /home/eren/FPEnet/test.py
[07/09/2022-22:37:39] [TRT] [W] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
Traceback (most recent call last):
File “/home/eren/FPEnet/test.py”, line 150, in
fpenet_obj = FpeNet(‘/home/eren/FPEnet/model32.trt’)
File “/home/eren/FPEnet/test.py”, line 35, in init
self._allocate_buffers()
File “/home/eren/FPEnet/test.py”, line 61, in _allocate_buffers
host_mem = cuda.pagelocked_empty(size, dtype)
NameError: name ‘dtype’ is not defined
[07/09/2022-22:37:43] [TRT] [E] 1: [defaultAllocator.cpp::deallocate::35] Error Code 1: Cuda Runtime (invalid argument)
Segmentation fault (core dumped)

My code is adapted from the previous forum topic “how to inference with fpenet” test.py and is as follows

test.py (4.8 KB)

I tried to change batch size to -1 …than it can inference …but the landmark coordinates are all 0 then…

Where could be the problem??
I appreciate your help…best regards

Hi,

In the test.py:

fpenet_obj = FpeNet('fpenet_b1_fp32.trt')

It looks like you deserialize a TensorRT engine directly.
Did you generate that engine on the XavierNX and the same JetPack version?

Please noted that TensorRT is not portable since it is optimized based on the hardware resources.
So you will need to generate it on the same GPU architecture and TensorRT software version.

Thanks.

Yes I have converted it with various following commands on the same device(jetson nx)

tao-converter
-k nvidia_tlt
-t fp16 (I converted also with fp32)
-p input_face_images:0,1x1x80x80,1x1x80x80,2x1x80x80
-e /home/eren/FPEnet/model.engine (and used also various other names, I think it is not important)
-m 1
-w 1000000000 (tried also without -w)
/home/eren/FPEnet/model.etlt

Hi,

We have a deployed sample for FPENet below.
Could you give it a try?

Thanks.

I had downloaded the deepstream_tao_apps folder I,
$cd apps/tao_others/deepstream-faciallandmark-app
$export CUDA_VER=10.2
but
$make
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/nvidia/deepstream/deepstream/lib/cvcore_libs
./deepstream-faciallandmark-app 2 …/…/…/configs/facial_tao/sample_faciallandmarks_config.txt file:///usr/data/faciallandmarks_test.jpg ./landmarks

cannot compile as:

make
g++ -c -o deepstream_faciallandmark_app.o -fpermissive -Wall -Werror -DPLATFORM_TEGRA -I/opt/nvidia/deepstream/deepstream/sources/includes -I/opt/nvidia/deepstream/deepstream/sources/includes/cvcore_headers -I /usr/local/cuda-10.2/include -I …/common pkg-config --cflags gstreamer-1.0 -D_GLIBCXX_USE_CXX11_ABI=1 -Wno-sign-compare -Wno-deprecated-declarations deepstream_faciallandmark_app.cpp
deepstream_faciallandmark_app.cpp:46:10: fatal error: nvds_yml_parser.h: No such file or directory
#include “nvds_yml_parser.h”
^~~~~~~~~~~~~~~~~~~
compilation terminated.
Makefile:70: recipe for target ‘deepstream_faciallandmark_app.o’ failed
make: *** [deepstream_faciallandmark_app.o] Error 1

I tried to find the file nvds_yml_parser.h, which did not exist.

I had downloaded the deepstream_tao_apps two months ago
deepstream and deepstream-6.0 otherwise works…

jetson_release -v

  • NVIDIA Jetson Xavier NX (Developer Kit Version)
    • Jetpack UNKNOWN [L4T 32.7.2]
    • NV Power Mode: MODE_20W_4CORE - Type: 7
    • jetson_stats.service: active
  • Board info:
    • Type: Xavier NX (Developer Kit Version)
    • SOC Family: tegra194 - ID:25
    • Module: P3668 - Board: P3509-000
    • Code Name: jakku
    • CUDA GPU architecture (ARCH_BIN): 7.2
    • Serial Number: 1421520056113
  • Libraries:
    • CUDA: 10.2.300
    • cuDNN: 8.2.1.32
    • TensorRT: 8.2.1.8
    • Visionworks: 1.6.0.501
    • OpenCV: 4.1.1 compiled CUDA: NO
    • VPI: ii libnvvpi1 1.2.3 arm64 NVIDIA Vision Programming Interface library
    • Vulkan: 1.2.70
  • jetson-stats:
    • Version 3.1.3
    • Works on Python 3.6.9

Cuda version is true…

What am I missing?

As a next step I installed deepstream 6.1 as the github link above was ‘fc-camel Update for DS 6.1’
but running the samples with:

deepstream-app -c configs/deepstream-app/source30_1080p_dec_infer-resnet_tiled_display_int8.txt

gives an error of:

deepstream-app: error while loading shared libraries: libyaml-cpp.so.0.6: cannot open shared object file: No such file or directory

as i have jetpack 4.6…Do I have to install jetpack 5 and install tao and then try to convert fpenet to trt and run the code again, as the ds6.1 is reinstalled now?

Does tao-converter jetson nx work with jetpack 5 and ds6.1? should I reformat and switch to jetpack 5?

Hi,

Since your JetPack version is 4.6, please stay on the Deepstream 6.0 for the compatible.

The repository also contains the source that can work on Deepstream 6.0.
Please check out the release/tao3.0_ds6.0ga branch and try it again.

For example

$ git clone -b release/tao3.0_ds6.0ga https://github.com/NVIDIA-AI-IOT/deepstream_tao_apps.git

Thanks.

Ok.
Reinstalled deepstream 6.0
Reloaded release/tao3.0_ds6.0ga

running the test gives following results:

eren@erennx:~/deepstream_tao_apps/apps/tao_others/deepstream-faciallandmark-app$ ./deepstream-faciallandmark-app 2 …/…/…/configs/facial_tao/sample_faciallandmarks_config.txt

file:///usr/data/faciallandmarks_test.jpg ./landmarks
Request sink_0 pad from streammux
Now playing: file:///usr/data/faciallandmarks_test.jpg
ERROR: Deserialize engine failed because file path: /home/eren/deepstream_tao_apps/configs/facial_tao/…/…/models/faciallandmark/faciallandmarks.etlt_b32_gpu0_int8.engine open error
0:00:02.786814738 11026 0x559e69d920 WARN nvinfer gstnvinfer.cpp:635:gst_nvinfer_logger: NvDsInferContext[UID 2]: Warning from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1889> [UID = 2]: deserialize engine from file :/home/eren/deepstream_tao_apps/configs/facial_tao/…/…/models/faciallandmark/faciallandmarks.etlt_b32_gpu0_int8.engine failed
0:00:02.814262025 11026 0x559e69d920 WARN nvinfer gstnvinfer.cpp:635:gst_nvinfer_logger: NvDsInferContext[UID 2]: Warning from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:1996> [UID = 2]: deserialize backend context from engine from file :/home/eren/deepstream_tao_apps/configs/facial_tao/…/…/models/faciallandmark/faciallandmarks.etlt_b32_gpu0_int8.engine failed, try rebuild
0:00:02.814524203 11026 0x559e69d920 INFO nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger: NvDsInferContext[UID 2]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1914> [UID = 2]: Trying to create engine from model files
WARNING: INT8 calibration file not specified/accessible. INT8 calibration can be done through setDynamicRange API in ‘NvDsInferCreateNetwork’ implementation
NvDsInferCudaEngineGetFromTltModel: Failed to open TLT encoded model file /home/eren/deepstream_tao_apps/configs/facial_tao/…/…/models/faciallandmark/faciallandmarks.etlt
ERROR: Failed to create network using custom network creation function
ERROR: Failed to get cuda engine from custom library API
0:00:03.326210325 11026 0x559e69d920 ERROR nvinfer gstnvinfer.cpp:632:gst_nvinfer_logger: NvDsInferContext[UID 2]: Error in NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1934> [UID = 2]: build engine file failed
ERROR: [TRT]: 2: [logging.cpp::decRefCount::61] Error Code 2: Internal Error (Assertion mRefCount > 0 failed. )
corrupted size vs. prev_size
Aborted (core dumped)

pointing to a downloaded 80x80 square test.jpg also ends in the same…

I appreciate any help…thanks

Second problem was: So the models were not downloaded…

After running:

:~/deepstream_tao_apps$ ./download_models.sh

and downloading models folder

and pointing to a test.jpg deepstrea_facial_app could inference, and while deepstream was trying to use the etlt file it ceated multiple engine files for INT8 INT16 engine files under its models folder…

So coming to the first question again, I have used the ‘facenet.etlt_b1_gpu0_int8.engine’ that the deepstream had ceated during inference with the test.py file for FPRnet, but in the end it gave the same error in the beginning…

~/FPEnet$ /home/eren/env/bin/python /home/eren/FPEnet/test.py
Traceback (most recent call last):
File “/home/eren/FPEnet/test.py”, line 150, in
fpenet_obj = FpeNet(‘/home/eren/FPEnet/facenet.etlt_b1_gpu0_int8.engine’)
File “/home/eren/FPEnet/test.py”, line 35, in init
self._allocate_buffers()
File “/home/eren/FPEnet/test.py”, line 61, in _allocate_buffers
host_mem = cuda.pagelocked_empty(size, dtype)
NameError: name ‘dtype’ is not defined
[07/13/2022-13:11:56] [TRT] [E] 1: [defaultAllocator.cpp::deallocate::35] Error Code 1: Cuda Runtime (invalid argument)
[07/13/2022-13:11:56] [TRT] [E] 1: [cudaDriverHelpers.cpp::operator()::29] Error Code 1: Cuda Driver (invalid device context)
Segmentation fault (core dumped)

Can I not use that engine files? what else could be the problem?

Hi,

It looks like Deepstream and TensorRT work fine.
The error should come from the customized implementation instead.

Have you run the script on other platforms?
Could you check if all the variables are well-set?

For example the dtype mentioned in the error:

NameError: name ‘dtype’ is not defined

More, since the Deepstream sample is working.
You can also try to use it for inference directly.

Thanks.

Checking the code again, correcting errors, and running it, gave following errors on my side.

Traceback (most recent call last):
File “/test.py”, line 156, in
fpenet_obj = FpeNet(‘/home/eren/FPEnet/facenet.etlt_b1_gpu0_int8.engine’)
File “/home/eren/FPEnet/test.py”, line 35, in init
self._allocate_buffers()
File “/home/eren/FPEnet/test.py”, line 65, in _allocate_buffers
dtype = binding_to_type[str(binding)]
KeyError: ‘input_1’
[07/14/2022-09:35:48] [TRT] [E] 1: [defaultAllocator.cpp::deallocate::35] Error Code 1: Cuda Runtime (invalid argument)
Segmentation fault (core dumped)

this error seemed to be the due to running in the correct tao container.Is it?

downloaded the nvcr.io/nvidia/tao/tao-toolkit-tf:v3.22.05-tf1.15.4-py3 to run but it turned out to be for amd64…with following error:

(env) eren@erennx:~/FPEnet$ sudo docker run --gpus all -it -v /workspace/tlt-experiments/:/workspace/tlt-experiments -p 8888:8888 nvcr.io/nvidia/tao/tao-toolkit-tf:v3.22.05-tf1.15.4-py3 /bin/bash
WARNING: The requested image’s platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
exec /usr/local/bin/install_ngc_cli.sh: no such file or directory

To run it on my jetson nx what should I do?Is the L4 Base the right container?Hoe can I run my files without any internet connection on jetson nx? Is it possible to run on jetson nx without any containers?

Hi,

Based on the error:

dtype = binding_to_type[str(binding)]
KeyError: ‘input_1’

It looks like an implementation error.
You can find a working example below:
https://elinux.org/Jetson/L4T/TRT_Customized_Example#OpenCV_with_PLAN_model

Thanks.

1 Like

Even with the method you mentioned, I was not able to get rid of the error.
pycuda._driver.MemoryError: cuMemHostAlloc failed: out of memory

The tao converter or deepstream-app create the same type of engine: they accept a input.shape (-1, 1, 80, 80)

that creates a out of memory error…Changing batch size to -1 results in landmarks all (0,0)

I still appreciate the help…
Thanks

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

Hi,

Is it possible to set the batch size to 1?
Thanks