Required Info:
- Software Version
[ x ] DRIVE OS 6.0.8.1 - Target OS
[x ] Linux - SDK Manager Version
[x ] 1.9.2.10884 - Host Machine Version
[x ] native Ubuntu Linux 20.04 Host installed with DRIVE OS DOCKER Containers
Describe the bug
with helloworld and sum source code provided by driveworks-5.14 release sample, nv_driveworks/driveworks-5.14/samples/src/cgf_nodes/HelloWorldNode.hpp at main Β· ZhenshengLee/nv_driveworks Β· GitHub , nv_driveworks/driveworks-5.14/samples/src/cgf_nodes/SumNodeImpl.cpp at main Β· ZhenshengLee/nv_driveworks Β· GitHub
and the official cgf helloworld demo guide nv_driveworks/drive-agx-orin-doc/3-drive-works/CGF-presentation.pdf at main Β· ZhenshengLee/nv_driveworks Β· GitHub , we run the minimal test for the connection of mailbox and reuse, and found the unexpected behavior.
To Reproduce
- build a cgf app with helloworld and sum process.
- for helloworld, the hyperepoch is
10ms*10
, for sum, the hyperepoch is20ms*5
, the upper stream is twice faster than downstream. - in sumNode, add checking logic that in every epoch, the sumNode never receive the repeated data/msg.
- compile and run the cgf app, check the data consistency
for step2, the code is in the following
app.json
{
"name": "DWCGFHelloworldApp",
"logSpec": "file/rfc5424:{{logpath}}/{{appname}}.log",
"parameters": {},
"requiredSensors": "../../../extra/appCommon/DWCGFImagePipe.required-sensors.json",
"sensorMappingLookups": [
"../../../extra/sensor_mappings"
],
"subcomponents": {
"top": {
"componentType": "./DWCGFHelloworld.graphlet.json",
"parameters": {}
}
},
"connections": [],
"states": {
"STANDARD": {
"stmScheduleKey": "standardSchedule",
"default": true
}
},
"stmSchedules": {
"standardSchedule": {
"wcet": "./DWCGFHelloworld_wcet.yaml",
"hyperepochs": {
"helloworldHyperepoch": {
"period": 100000000,
"epochs": {
"helloworldEpoch": {
"period": 10000000,
"frames": 10,
"passes": [
[
"top.helloWorldNode"
]
]
}
},
"resources": {
"machine0.CPU1": []
}
},
"sumHyperepoch": {
"period": 100000000,
"epochs": {
"sumEpoch": {
"period": 20000000,
"frames": 5,
"passes": [
[
"top.sumNode"
]
]
}
},
"resources": {
"machine0.CPU2": []
}
}
}
}
},
"processes": {
"ssm": {
"runOn": "machine0",
"executable": "vanillassm",
"logSpec": "file:{{logpath}}/{{appname}}.log"
},
"schedule_manager": {
"runOn": "machine0",
"executable": "ScheduleManager",
"argv": {
"--enableScheduleSwitching": "false",
"--scheduleManagerHostIP": "127.0.0.1",
"--scheduleManagerHostPort": "40100",
"--scheduleManagerNumClients": "1"
}
},
"stm_master": {
"runOn": "machine0",
"executable": "stm_master",
"logSpec": "file:{{logpath}}/{{appname}}.log",
"argv": {
"--allow-unregistered-runnables": true,
"--timeout-us": "60000000",
"--soc": "TegraA",
"--core": "0",
"--enable-memlock": true,
"-m": true,
"--log": "./LogFolder/team_node/Helloworld/stm.log",
"--master-forked-log-path": "./LogFolder/team_node/Helloworld",
"--epochs": "300"
}
},
"helloworld_process0": {
"runOn": "machine0",
"executable": "LoaderLite",
"subcomponents": [
"top.helloWorldNode"
]
},
"sum_process0": {
"runOn": "machine0",
"executable": "LoaderLite",
"subcomponents": [
"top.sumNode"
]
}
},
"extraInfo": "../../../extra/appCommon/DWCGFImagePipeExtraInfo.json"
}
graphlet.json
{
"name": "DWCGFHelloworld",
"inputPorts": {},
"outputPorts": {},
"parameters": {
"paraName": { "type": "std::string", "default": "helloworld_name" }
},
"subcomponents": {
"helloWorldNode": {
"componentType": "../../../nodes/example/helloworld/HelloWorldNode.node.json",
"parameters": {
"name": "$paraName"
}
},
"sumNode": {
"componentType": "../../../nodes/example/helloworld/SumNode.node.json"
}
},
"connections": [
{
"src": "helloWorldNode.VALUE_0",
"dests": {
"sumNode.VALUE_0": {
"mailbox": true,
"reuse": true,
}
}
},
{
"src": "helloWorldNode.VALUE_1",
"dests": {
"sumNode.VALUE_1": {
"mailbox": true,
"reuse": true
}
}
}
]
}
for step3, the code is in following
if (inPort0.isBufferAvailable() && inPort1.isBufferAvailable())
{
auto inputValue0 = *inPort0.getBuffer();
auto inputValue1 = *inPort1.getBuffer();
DW_LOGD << "[Epoch " << m_epochCount << "]"
<< " Received " << inputValue0 << " from input VALUE_0"
<< ", received " << inputValue1 << " from input VALUE_1."
<< " Add together: " << (inputValue0 + inputValue1) << "!" << Logger::State::endl;
if( m_lastValue0 == inputValue0)
{
DW_LOGD << "int32_t is identical with the last one" << Logger::State::endl;
}
m_lastValue0 = inputValue0;
m_lastValue1 = inputValue1;
}
else
{
DW_LOGD << "[Epoch " << m_epochCount << "] inPort.buffer not available!" << Logger::State::endl;
}
Expected behavior
run with 300 epochs, and check the log files, and it should never show the log int32_t is identical with the last one
Actual behavior
the log shows int32_t is identical with the last one
, which means that the msg in the downstream sumNode is too late to update, and the old/repeated data was reused by downstream sumNode.
<13>1 2024-04-19T14:19:44.901606+08:00 - sum_process0 103896 - - [1713507555415171us][DEBUG][tid:38][SumNodeImpl.cpp:138][SumNode] [Epoch 299] Received 598 from input VALUE_0, received 9402 from input VALUE_1. Add together: 10000!
<13>1 2024-04-19T14:19:44.921565+08:00 - sum_process0 103896 - - [1713507555435129us][DEBUG][tid:38][SumNodeImpl.cpp:138][SumNode] [Epoch 300] Received 599 from input VALUE_0, received 9401 from input VALUE_1. Add together: 10000!
<13>1 2024-04-19T14:19:44.941576+08:00 - sum_process0 103896 - - [1713507555455140us][DEBUG][tid:38][SumNodeImpl.cpp:138][SumNode] [Epoch 301] Received 601 from input VALUE_0, received 9399 from input VALUE_1. Add together: 10000!
<13>1 2024-04-19T14:19:44.961625+08:00 - sum_process0 103896 - - [1713507555475189us][DEBUG][tid:38][SumNodeImpl.cpp:138][SumNode] [Epoch 302] Received 604 from input VALUE_0, received 9396 from input VALUE_1. Add together: 10000!
<13>1 2024-04-19T14:19:44.981537+08:00 - sum_process0 103896 - - [1713507555495100us][DEBUG][tid:38][SumNodeImpl.cpp:138][SumNode] [Epoch 303] Received 604 from input VALUE_0, received 9396 from input VALUE_1. Add together: 10000!
<13>1 2024-04-19T14:19:44.981562+08:00 - sum_process0 103896 - - [1713507555495128us][DEBUG][tid:38][SumNodeImpl.cpp:144][SumNode] int32_t is identical with the last one
<13>1 2024-04-19T14:19:45.001617+08:00 - sum_process0 103896 - - [1713507555515181us][DEBUG][tid:38][SumNodeImpl.cpp:138][SumNode] [Epoch 304] Received 608 from input VALUE_0, received 9394 from input VALUE_1. Add together: 10002!
<13>1 2024-04-19T14:19:45.021607+08:00 - sum_process0 103896 - - [1713507555535170us][DEBUG][tid:38][SumNodeImpl.cpp:138][SumNode] [Epoch 305] Received 609 from input VALUE_0, received 9391 from input VALUE_1. Add together: 10000!
<13>1 2024-04-19T14:19:45.041602+08:00 - sum_process0 103896 - - [1713507555555166us][DEBUG][tid:38][SumNodeImpl.cpp:138][SumNode] [Epoch 306] Received 611 from input VALUE_0, received 9389 from input VALUE_1. Add together: 10000!
<13>1 2024-04-19T14:19:45.061610+08:00 - sum_process0 103896 - - [1713507555575173us][DEBUG][tid:38][SumNodeImpl.cpp:138][SumNode] [Epoch 307] Received 613 from input VALUE_0, received 9387 from input VALUE_1. Add together: 10000!
the log tar files is uploaded here.
team_node.zip (189.0 KB)
Additional context
this inconsitency behavior doesnβt happen in shm-communication, when we put the helloworldNode and sumNode into a single process.