Required Info:
- Software Version
[ x ] DRIVE OS 6.0.8.1 - Target OS
[x ] Linux - SDK Manager Version
[x ] 1.9.2.10884 - Host Machine Version
[x ] native Ubuntu Linux 20.04 Host installed with DRIVE OS DOCKER Containers
Describe the bug
with helloworld and sum source code provided by driveworks-5.14 release sample, nv_driveworks/driveworks-5.14/samples/src/cgf_nodes/HelloWorldNode.hpp at main · ZhenshengLee/nv_driveworks · GitHub , nv_driveworks/driveworks-5.14/samples/src/cgf_nodes/SumNodeImpl.cpp at main · ZhenshengLee/nv_driveworks · GitHub
and the official cgf helloworld demo guide https://github.com/ZhenshengLee/nv_driveworks/blob/main/drive-agx-orin-doc/3-drive-works/CGF-presentation.pdf , we run the minimal test for the connection of nvsci, and found the unexpected behavior.
To Reproduce
- follow the nvsci doc and add channel config in
/etc/nvsciipc.cfg
- follow the official pdf and build a cgf app with helloworld and sum process.
- edit the json, change the connection from default to nvsci+interprocess, with 3 cases
caseA, 1 connection with default streamName
caseB: 2 connections with default streamName
caseC: 1 connection with custom streamName - compile and run the cgf app, check the log and data consistency
for step0, add the following channel config to /etc/nvsciipc.cfg
, use the shell if you want
for i in {0..3..1}; do sudo echo "INTER_PROCESS cgf_$(($i+0))_0 cgf_$(($i+0))_1 16 4096" | sudo tee -a /etc/nvsciipc.cfg; done
sudo sed -i '$ a INTER_PROCESS cgf_hello_0 cgf_hello_1 16 4096' /etc/nvsciipc.cfg
sudo sed -i '$ a INTER_PROCESS cgf_hello_2 cgf_hello_3 16 4096' /etc/nvsciipc.cfg
sudo sed -i '$ a INTER_PROCESS cgf_hello_4 cgf_hello_5 16 4096' /etc/nvsciipc.cfg
INTER_PROCESS cgf_0_0 cgf_0_1 16 4096
INTER_PROCESS cgf_1_0 cgf_1_1 16 4096
INTER_PROCESS cgf_2_0 cgf_2_1 16 4096
INTER_PROCESS cgf_3_0 cgf_3_1 16 4096
INTER_PROCESS cgf_hello_0 cgf_hello_1 16 4096
INTER_PROCESS cgf_hello_2 cgf_hello_3 16 4096
INTER_PROCESS cgf_hello_4 cgf_hello_5 16 4096
for step2, caseA, 1 connection with default streamName
graphlet.json
{
"src": "helloWorldNode.VALUE_0",
"dests": {
"sumNode.VALUE_0": {
"mailbox": true,
"reuse": true
}
},
"params": {
"type": "nvsci"
}
},
{
"src": "helloWorldNode.VALUE_1",
"dests": {
"sumNode.VALUE_1": {
"mailbox": true,
"reuse": true
}
}
},
for step2, caseB, 2 connections with default streamName
graphlet.json
{
"src": "helloWorldNode.VALUE_0",
"dests": {
"sumNode.VALUE_0": {
"mailbox": true,
"reuse": true
}
},
"params": {
"type": "nvsci"
}
},
{
"src": "helloWorldNode.VALUE_1",
"dests": {
"sumNode.VALUE_1": {
"mailbox": true,
"reuse": true
}
},
"params": {
"type": "nvsci"
}
},
for step2, caseC: 1 connection with custom streamName
graphlet.json
{
"src": "helloWorldNode.VALUE_0",
"dests": {
"sumNode.VALUE_0": {
"mailbox": true,
"reuse": true
}
},
"params": {
"type": "nvsci",
"srcEndpoint": "cgf_hello_0",
"id": "cgf_hello_0",
"destEndpoint": "cgf_hello_1"
}
},
{
"src": "helloWorldNode.VALUE_1",
"dests": {
"sumNode.VALUE_1": {
"mailbox": true,
"reuse": true
}
}
},
Expected behavior
with helloworld sample, the nvsci+interprocess communication should be available in cgf, in all cases.
Actual behavior
For CaseA, 1 connection with default streamName, the cgf app works well.
For CaseB and CaseC, the cgf app hangs with exception report in launcher.log
with DW_INTERNAL_ERROR
DefaultLogger: [14-05-2024 16:21:16] TopExecutor: set TopExecutor to cpuset /sys/fs/cgroup/cpuset/rr2init/tasks
DefaultLogger: [14-05-2024 16:21:16] Failed to open cpuset /sys/fs/cgroup/cpuset/rr2init/tasks, errno: No such file or directory (2)
DefaultLogger: [14-05-2024 16:21:16] Failed to set TopExecutor to RR2 init cpuset: /sys/fs/cgroup/cpuset/rr2init/tasks
terminate called after throwing an instance of 'dw::core::ExceptionWithStatusCode<int, dw::core::detail::IntToStr>'
what(): 14: apps/roadrunner-2.0/framework/runtime/Runtime.cpp:1466 DriveWorks Error DW_INTERNAL_ERROR: node name: top.helloWorldNode output portID: 0output portName: VALUE_0
[STM] Waiting for STM master to start...
[STM] Master launched
[STM] Performing Client Global Shm Init ...
[STM] Waiting for all clients to start...
_ctx=0xffffadc34e10
Stack trace (most recent call last):
#16 Object "/usr/local/driveworks-5.14/bin/LoaderLite", at 0xaaaab8b78d6f, in $x
Additional context
- Please the forum reproduce the issue with the official minimal helloworld sample.
- Please provide the correct way to use the nvsci channel in dwcgf, and all needed infomation to help us run dwcgf well.