[BUG] dwcgf error of NvSciIpcOpenEndpoint with shm header not cleared

Required Info:

  • Software Version
    DRIVE OS 6.0.6
  • Target OS
    Linux
  • SDK Manager Version
    1.9.2.10884
  • Host Machine Version
    native Ubuntu Linux 20.04 Host installed with DRIVE OS DOCKER Containers

Describe the bug

To Reproduce

the repo is here GitHub - ZhenshengLee/nv_driveworks_demo: Nvidia Driveworks Demo with CGF, ROS2 and Docker.

# goto target system which is orin devkit
cd ./nv_driveworks_demo/target/aarch64/install/example/dwcgf_helloworld/bin/
sudo ./run_cgf.sh

Expected behavior

the helloworld app can run as expected.

Actual behavior

run_cgf.sh outputs the error

Running command: /usr/local/driveworks/bin/launcher --binPath=/usr/local/driveworks/bin --spec=/home/nvidia/zhensheng/orin_ws/nv_driveworks_demo/target/aarch64/install/example/dwcgf_helloworld/graphs/app/DWCGFHelloworld.app.json --logPath=/home/nvidia/zhensheng/orin_ws/nv_driveworks_demo/target/aarch64/install/example/dwcgf_helloworld/LogFolder --path=/usr/local/driveworks/bin --datapath=/home/nvidia/zhensheng/orin_ws/nv_driveworks_demo/target/aarch64/install/example/dwcgf_helloworld/data --dwdatapath=/home/nvidia/zhensheng/orin_ws/nv_driveworks_demo/target/aarch64/install/example/dwcgf_helloworld/data --vdcpath=/usr/local/driveworks/bin --schedule=/home/nvidia/zhensheng/orin_ws/nv_driveworks_demo/target/aarch64/install/example/dwcgf_helloworld/bin/DWCGFHelloworld__standardSchedule.stm --start_timestamp=0 --mapPath=maps/sample/sanjose_loop --loglevel=DW_LOG_VERBOSE --fullscreen=0 --winSizeW=1920 --winSizeH=1200 --virtual=1 --disableStmControlLogger=1 --gdb_debug=0 --app_parameter= > /home/nvidia/zhensheng/orin_ws/nv_driveworks_demo/target/aarch64/install/example/dwcgf_helloworld/LogFolder/launcher.log 2>&1
Check if reset NetworkStack needed
Restore LD_LIBRARY_PATH to 
=======================================================================
launcher exit status: 33

stm_master.log shows the following

appExecutable: stm_master
argv[0] : /usr/local/driveworks/bin/stm_master
argv[1] : --schedule=/home/nvidia/zhensheng/orin_ws/nv_driveworks_demo/target/aarch64/install/example/dwcgf_helloworld/bin/DWCGFHelloworld__standardSchedule.stm
argv[2] : --allow-unregistered-runnables
argv[3] : --soc=TegraA
argv[4] : --timeout-us=60000000
argv[5] : -m
argv[6] : -v
argv[7] : --schedule-manager-name=CGF-ScheduleManager
argv[8] : --num-input-schedule=1
[STM WARNING]:[av/stm/runtime/src/core/stdout.c][logSetVerbose] [25]: Verbose mode has been enabled for STM. ***Note that this mode affects STM latency guarantees, and is meant for use only in debugging. Do not use '-v' for performance testing.
[STM INFO]:[av/stm/runtime/src/master/master.c][stmMasterInitScheduleList] Input Schedule: /home/nvidia/zhensheng/orin_ws/nv_driveworks_demo/target/aarch64/install/example/dwcgf_helloworld/bin/DWCGFHelloworld__standardSchedule.stm
[STM INFO]:[av/stm/runtime/src/master/master.c][stmMasterInitScheduleList] Following 1 schedules provided.
[STM INFO]:[av/stm/runtime/src/master/master.c][stmMasterInitScheduleList] 	 Schedule0 : /home/nvidia/zhensheng/orin_ws/nv_driveworks_demo/target/aarch64/install/example/dwcgf_helloworld/bin/DWCGFHelloworld__standardSchedule.stm.
[STM INFO]:[av/stm/runtime/src/master/master.c][stmMasterInitClientPerSchedule] Schedule ID : 0
[STM INFO]:[av/stm/runtime/src/master/master.c][stmMasterInitClientPerSchedule] 	Client name: helloworld_process0
[STM INFO]:[av/stm/runtime/src/master/master.c][stmMasterInitClientPerSchedule] 	Client soc: TegraA
[STM INFO]:[av/stm/runtime/src/master/master.c][stmMasterInitClientPerSchedule] 	Client name: framesync_TegraA_helloworldHyperepoch_helloworldEpoch
[STM INFO]:[av/stm/runtime/src/master/master.c][stmMasterInitClientPerSchedule] 	Client soc: TegraA
!ERR![L:91]:nvsciipc_ipc_check_end: endpoint is already occupied by (pid:71396)
!ERR![L:971]:nvsciipc_ipc_open_endpoint: init resources is failed: 514
!ERR![L:581]:nvsciipc_ipc_close_endpoint_internal: stm_73746d_0: pid is changed (71396->71487) shm header is NOT cleared
!ERR![L:1071]error: NvSciIpcOpenEndpoint: 
!ERR![L:1072]stm_73746d_0: 514
av/stm/runtime/src/master/master.c:124 assertion failure, errno=0 (Success)

Additional context

this bug is happen occasionally.

Dear @lizhensheng,
what is the frequency of occurrence of this issue? When you hit this error, You continuously get it every run after that? If so, does restart fix the issue?

It’s hard to say the frequency, sometimes.
When I hit this error, I continuously get it with less than 10 times of retry.
After sometime, this issue fixed and the run_cgf.sh work again.

Abuse of ps -ef | grep -e framesync -e stm_ | grep -v grep | awk '{print $2}' | xargs -rt sudo kill -s KILL || true and sudo rm -rf /dev/shm/* /dev/mqueue/* may cause this issue.

Close.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Dear @lizhensheng,
I appreciate your investigation. May I know if you have fixed the issue? If so, could you share the steps?

1 Like