[BUG] DO6081/CGF 5.14 app cannot exit gracefully: rerun failed, reporting internel errors and require reboot

Required Info:

  • Software Version
    DRIVE OS 6.0.8.1
  • Target OS
    Linux
  • SDK Manager Version
    1.9.2.10884
  • Host Machine Version
    native Ubuntu Linux 20.04 Host installed with DRIVE OS DOCKER Containers

Describe the bug

As described in previous topics [BUG] cgf helloworld example 6.0.8.1 return status 1, stm_master errno=11 (Resource temporarily unavailable) - DRIVE AGX Orin / DRIVE AGX Orin General - NVIDIA Developer Forums [BUG] cgf helloworld example 6.0.8.1 throw NvSciIpcOpenEndpointWithEventService() failed; Returned err: 514 - DRIVE AGX Orin / DRIVE AGX Orin General - NVIDIA Developer Forums , there are issues when cgf app exits, returning multiple errors, rerun failed and requiring reboot.

To Reproduce

see [BUG] cgf helloworld example 6.0.8.1 return status 1, stm_master errno=11 (Resource temporarily unavailable) - DRIVE AGX Orin / DRIVE AGX Orin General - NVIDIA Developer Forums [BUG] cgf helloworld example 6.0.8.1 throw NvSciIpcOpenEndpointWithEventService() failed; Returned err: 514 - DRIVE AGX Orin / DRIVE AGX Orin General - NVIDIA Developer Forums ,

Expected behavior

the simple official cgf helloworld app can run multiple times, easily and repeatably.

Actual behavior

see [BUG] cgf helloworld example 6.0.8.1 return status 1, stm_master errno=11 (Resource temporarily unavailable) - DRIVE AGX Orin / DRIVE AGX Orin General - NVIDIA Developer Forums [BUG] cgf helloworld example 6.0.8.1 throw NvSciIpcOpenEndpointWithEventService() failed; Returned err: 514 - DRIVE AGX Orin / DRIVE AGX Orin General - NVIDIA Developer Forums ,

the helloworld app run only once after a fresh new reboot, and often cannot be rerun, and needing reboot again.

Additional context

Lets take the official sample nv_driveworks/driveworks-5.14/bin/run_cgf_demo.sh at main · ZhenshengLee/nv_driveworks · GitHub for example

  1. I write a runHelloworld.sh according the sample /run_cgf_demo.sh, please check if there is any mistake to make the app unable to exit gracefully
    runHelloworld.sh.txt (12.5 KB)

  2. Is there a best practice to ensure the gracefully exiting? Please show and write it in the shell, so developers can reuse.

Thanks.

Dear @lizhensheng,
Thank you for highlighting the issue. I will check on it and update you.

1 Like

Friendly ping @SivaRamaKrishnaNV for any updates ragarding this error [BUG] cgf helloworld example 6.0.8.1 return status 1, stm_master errno=11 (Resource temporarily unavailable) - #17 by SivaRamaKrishnaNV

It is comfirmed with our test, that when Ctrl-C is input in the initialization phase of cgfNodes, the driveworks context cannot be released properly, thus no output the process_log with Releasing Driveworks SDK Context, and finally cause the cgf reboot issue(resource temporily unavailable).

In contrast, when Ctrl-C is input during the schedule phase of cgfNodes, the driveworks context can be released correctly, thus you can see the process_log with Releasing Driveworks SDK Context. And you can relaunch the cgf_app without any issue.

@SivaRamaKrishnaNV

  1. Could you help to figure/explain the behavior handling Ctrl-C?
  2. Please promote this issue resolution!
1 Like

Dear @lizhensheng,
Thank you for your investigation. It is indeed valuable. Does that mean, if we use CTRL+C immediately(or few seconds later) after the launch of application it leads to erroneous state?

I think the answer is yes, you could repeat with your test environment.

Great Pity.

There is a WAR to this issue provided by our team, that is nvsciipc_reset before every run of cgf_app.

__releaseNvSCIIPC() {
    if [[ "$OS" == "Linux" ]]; then
        while read line
        do
            needLine=`echo $line | grep INTER_PROCESS | grep stm_`
            [ ! "$needLine" ] && continue
            channel1=`echo $line | awk -F " " '{print $2}'`
            channel2=`echo $line | awk -F " " '{print $3}'`
            [ "$channel1" ] && ${XPLATFORM_COMMON_PATH}/nvsciipc_reset -c $channel1 >/dev/null 2>&1
            [ "$channel2" ] && ${XPLATFORM_COMMON_PATH}/nvsciipc_reset -c $channel2 >/dev/null 2>&1
        done < /etc/nvsciipc.cfg
    fi
}
2 Likes

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.