[BUG] running cgf terminate with error Could not open ShmDescriptor; errno: 17 (File exists)

Required Info:

  • Software Version
    DRIVE OS 6.0.6
  • Target OS
    Linux
  • SDK Manager Version
    1.9.2.10884
  • Host Machine Version
    native Ubuntu Linux 20.04 Host installed with DRIVE OS DOCKER Containers

Describe the bug

there is a cgf project and a running shell in nv_driveworks_demo/runRawCameraDeployPipe.sh at main · ZhenshengLee/nv_driveworks_demo · GitHub

this shell is based on run_cgf from dwcgf documentation page.

running this shell without sudo causing this repetable error from stm_master

!ERR![L:84]:nvsciipc_ipc_check_end: pid is not 0, but process doesn't exist, (pid:41715)
!ERR![L:84]:nvsciipc_ipc_check_end: pid is not 0, but process doesn't exist, (pid:41715)
terminate called after throwing an instance of 'nvstm::ErrnoError'
  what():  Could not open ShmDescriptor; errno: 17 (File exists)

To Reproduce

cd target/aarch64/install
./bin/dwcgf_image_pipe/runRawCameraDeployPipe.sh

Expected behavior

the image pipeline works well.

Actual behavior

stm_master.log

appExecutable: stm_master
argv[0] : /usr/local/driveworks/bin/stm_master
argv[1] : --schedule=/home/nvidia/zhensheng/orin_ws/nv_driveworks_demo/target/aarch64/install/bin/dwcgf_image_pipe/DWCGFRawCameraDeployPipe__standardSchedule.stm
argv[2] : --allow-unregistered-runnables
argv[3] : --soc=TegraA
argv[4] : --timeout-us=60000000
argv[5] : -m
argv[6] : -v
argv[7] : --schedule-manager-name=CGF-ScheduleManager
argv[8] : --num-input-schedule=1
[STM WARNING]:[av/stm/runtime/src/core/stdout.c][logSetVerbose] [25]: Verbose mode has been enabled for STM. ***Note that this mode affects STM latency guarantees, and is meant for use only in debugging. Do not use '-v' for performance testing.
[STM INFO]:[av/stm/runtime/src/master/master.c][stmMasterInitScheduleList] Input Schedule: /home/nvidia/zhensheng/orin_ws/nv_driveworks_demo/target/aarch64/install/bin/dwcgf_image_pipe/DWCGFRawCameraDeployPipe__standardSchedule.stm
[STM INFO]:[av/stm/runtime/src/master/master.c][stmMasterInitScheduleList] Following 1 schedules provided.
[STM INFO]:[av/stm/runtime/src/master/master.c][stmMasterInitScheduleList] 	 Schedule0 : /home/nvidia/zhensheng/orin_ws/nv_driveworks_demo/target/aarch64/install/bin/dwcgf_image_pipe/DWCGFRawCameraDeployPipe__standardSchedule.stm.
[STM INFO]:[av/stm/runtime/src/master/master.c][stmMasterInitClientPerSchedule] Schedule ID : 0
[STM INFO]:[av/stm/runtime/src/master/master.c][stmMasterInitClientPerSchedule] 	Client name: pilotPipe_process
[STM INFO]:[av/stm/runtime/src/master/master.c][stmMasterInitClientPerSchedule] 	Client soc: TegraA
[STM INFO]:[av/stm/runtime/src/master/master.c][stmMasterInitClientPerSchedule] 	Client name: framesync_TegraA_pilotHyperepoch_cameraEpoch
[STM INFO]:[av/stm/runtime/src/master/master.c][stmMasterInitClientPerSchedule] 	Client soc: TegraA
!ERR![L:84]:nvsciipc_ipc_check_end: pid is not 0, but process doesn't exist, (pid:41715)
!ERR![L:84]:nvsciipc_ipc_check_end: pid is not 0, but process doesn't exist, (pid:41715)
terminate called after throwing an instance of 'nvstm::ErrnoError'
  what():  Could not open ShmDescriptor; errno: 17 (File exists)

the full log of cgf is in this zip.

LogFolder.zip (11.6 KB)

Additional context

  1. there is no sudo used
  2. no sudo is required

Dear @lizhensheng,
So this error does not occur when you run the shell script using sudo?
Do you notice this issue after restart of target and run the app?

Additional context

  1. there is no sudo used
  2. no sudo is required

You seems to use run_cgf.sh to create your shell script. We use sudo run_cgf.sh to launch application. May I know why you think sudo is not needed in your case?

I tried with sudo, the error disapear with sudo.

Yes, restart doesn’t work.

Yes, I’m trying to develop some custom cgf apps.

  1. For security reasons. sudo is not allowed in our project workflow.
  2. For consistency reasons. From the official dw sample, we can see there is no need of sudo in all/almost of driveworks programs. The permission issue shoud be well defined.
  3. For interoperational reasons. Some other middlewares work in the non-root-domain.

Thanks.

I checked the contents in /dev/shm and /dev/mqueue, and the contents do not disapear after reboot of the system.

I deleted all contents in the folders, the cgf app report the errors showing this

[STM][ERROR] Could not open MqExistingDescriptor; errno: 2 (No such file or directory)
av/stm/runtime/src/client/stm_manager.c:65 assertion failure, errno=2 (No such file or directory)
[2023-05-24T06:52:25.116347Z][ERROR][tid:0][Launcher.cpp:917][Launcher] Process schedule_manager:3543 terminated by signal: 6 (Aborted)
[2023-05-24T06:52:25.116421Z][INFO][tid:0][Launcher.cpp:974][Launcher] waitForChildExit: No more child process!
[2023-05-24T06:52:25.116435Z][ERROR][tid:0][Launcher.cpp:1220][Launcher] All child processes has been killed successfully.
[2023-05-24T06:52:25.116534Z][DEBUG][tid:0][Launcher.cpp:1450][Launcher] swc_list.txt content:

Dear @lizhensheng,
When you hit this issue, could you check if there is stm process running (ps -aux | grep stm)? If so, please check killing them before re-running?

It appears that run_cgf.sh needs to be run using sudo. Let me check if there is any WAR?

Thanks, I will check

I don’t think it’s true that sudo is needed to run cgfapps, because there is a helloworld cgf app can be run with non-root user. see this nv_driveworks_demo/example/dwcgf/helloworld at main · ZhenshengLee/nv_driveworks_demo · GitHub

In our workflow, it currently creats the WAR with ROS2, which runs only with non-root user.

There are security reasons, so there is strong needs to run with non-root.

Thanks.

friendly ping @SivaRamaKrishnaNV for updates.

we also found another error report when sudo is not used

launcher.log

[STM] Waiting for STM master to start...
[STM][ERROR] Could not open MqExistingDescriptor; errno: 2 (No such file or directory)
av/stm/runtime/src/client/stm_manager.c:65 assertion failure, errno=2 (No such file or directory)
[2023-05-26T08:08:10.552895Z][ERROR][tid:0][Launcher.cpp:917][Launcher] Process schedule_manager:55331 terminated by signal: 6 (Aborted)
[2023-05-26T08:08:10.552973Z][INFO][tid:0][Launcher.cpp:974][Launcher] waitForChildExit: No more child process!
[2023-05-26T08:08:10.552981Z][ERROR][tid:0][Launcher.cpp:1220][Launcher] All child processes has been killed successfully.
[2023-05-26T08:08:10.553114Z][DEBUG][tid:0][Launcher.cpp:1450][Launcher] swc_list.txt content:
line 1 : pilotPipe_process,127.0.0.1
line 2 : 

stm_master.log

!ERR![L:84]:nvsciipc_ipc_check_end: pid is not 0, but process doesn't exist, (pid:1199)
!ERR![L:84]:nvsciipc_ipc_check_end: pid is not 0, but process doesn't exist, (pid:1199)
terminate called after throwing an instance of 'nvstm::ErrnoError'
  what():  Could not open ShmDescriptor; errno: 17 (File exists)

I ran ps -aux | grep stm, and no stm process found.

Is there any official opinion from the core team about the usage of sudo?

Thanks.