Xavier NX multiple container demo stuck launching TRTIS

Hi,

I am trying to launch the multi-container demo for the Xavier NX.

It launches the DeepStream Container with the person detector and tracker very quickly.

It then gets stuck launching the TRTIS server.

I would appreciate any help!

Thanks.

The BERT demo eventually launched but the Pose and Gaze containers did not.

Got this error:
Error of failed requests: BadWindow (invalid window parameter)
Major opcode of failed request: 15 (X_QueryTree)
Resource id in failed request: 0xe12dd0
Serial number of failed request: 202
Current serial number in output stream: 202

running each script individually works (run_peopleDetect.sh, run_pose.sh, etc) - when I try to run all 4 together (not through the containers but by starting each seperately) I run out of memory.

Hi,

Error of failed requests: BadWindow (invalid window parameter)

This issue is related to the X11 sever.
Have you connected a display device to the XavierNX?

Thanks.

yes, I have.

Hi,

Have you exported the display device to the docker?

$ xhost +
$ sudo docker run ... -e DISPLAY=$DISPLAY ...

Thanks.

I am launching the script file that is part of the demo, and looking through it, it does activate the display.
For instance, in run_pose.sh, it has
DISPLAY=$DISPLAY in the sudo docker run line.

Interestingly, the following line in the run_demo.sh is commented out:
#export DISPLAY=:0

Hi,

Sorry for the late update.
Do you fix this issue yet?
If not, would you mind to try this command within the container first?

$ export DISPLAY=:1

Thanks.

tried it, when I run the demo,
xhost: unable to open display ":1"

setting
$ export DISPLAY=:0

lets it run, but only speech and person counting demo work, the other two never load.

Hi,

Thanks.
Let us give it a try and update more information with you later.

Hi,

Sorry for the late update.
This comes from the incompatible TensorRT serialized engine file.

Please noticed that these docker image is built on the JetPack 4.4 DP (TRT-7.1.0).
To run it on the JetPack 4.4 GA (TRT-7.1.3), please recompile the TensorRT engine first.

$ sudo docker run -it --rm --net=host --runtime nvidia  -e DISPLAY=$DISPLAY -v /tmp/.X11-unix/:/tmp/.X11-unix nvcr.io/nvidia/jets
$ python3
>>> import torch
>>> import torch2trt
>>> import trt_pose.models
>>> import tensorrt as trt
>>> MODEL = trt_pose.models.densenet121_baseline_att
>>> model = MODEL(18, 42).cuda().eval()
>>> model.load_state_dict(torch.load('/pose/generated/densenet121_baseline_att.pth'))
>>> data = torch.randn((1, 3, 224, 224)).cuda().float()
>>> model_trt = torch2trt.torch2trt(model, [data], fp16_mode=True, max_workspace_size=1 << 25, log_level=trt.Logger.VERBOSE)
>>> torch.save(model_trt.state_dict(), '/pose/generated/densenet121_baseline_att_trt.pth')
>>> exit()
$ python3 run_pose_pipeline.py /videos/pose_video.mp4

We can run the sample without issue after regenerating the TensorRT engine.
Thanks.

I got the same issue, can you tell me how do you recompile the TensorRT engine? thanh you !

Hi zhidong.sum

Please help to open a new topic for your issue. Thanks