Please provide the following info (tick the boxes after creating this topic): Software Version
[o] DRIVE OS 6.0.6
DRIVE OS 6.0.5
DRIVE OS 6.0.4 (rev. 1)
DRIVE OS 6.0.4 SDK
other
Target Operating System
[o] Linux
QNX
other
Hardware Platform
DRIVE AGX Orin Developer Kit (940-63710-0010-300)
DRIVE AGX Orin Developer Kit (940-63710-0010-200)
DRIVE AGX Orin Developer Kit (940-63710-0010-100)
DRIVE AGX Orin Developer Kit (940-63710-0010-D00)
DRIVE AGX Orin Developer Kit (940-63710-0010-C00)
[o] DRIVE AGX Orin Developer Kit (not sure its number)
other
SDK Manager Version
1.9.2.10884
[o] other
Host Machine Version
native Ubuntu Linux 20.04 Host installed with SDK Manager
[o] native Ubuntu Linux 20.04 Host installed with DRIVE OS Docker Containers
native Ubuntu Linux 18.04 Host installed with DRIVE OS Docker Containers
other
Hello,
I am trying to follow this documentation: https://developer.nvidia.com/docs/drive/drive-os/6.0.6/public/driveworks-nvcgf/index.html to setup a custom c++ node that logs a simple string using the dw::core::Logger in its PROCESS PASS. However, the documentation to integrate it with the STM and Schedule manager seems to be outdated as unlike the documentation the system description has moved from *.system.json to *.app.json and the default YAML files generated a no longer *.yaml but *__standardSchedule.yaml. I tried running a custom hello world application but I am getting errors on the schedule manager/STM with limited info in the docs to debug these. I am hoping to get some help with this.
Thank you for reaching out and providing the relevant files and information. I’ll check with our team about the outdated documentation issue.
To assist you better, could you please provide more details about the specific errors you encountered? Any additional information or error messages you can share will be helpful in understanding the issue and providing appropriate guidance for debugging.
Please peak at the DriveWorks 5.12 documentation, which is compatible with DRIVE OS 6.0.7 (although it’s not a devzone release). I’d like you to check if the documentation issues you identified have been resolved in that version.
STM][ERROR] Failed to receive mqueue message; errno: 110 (Connection timed out)
[STM ERROR]:[av/stm/runtime/src/master/main.c][main] [689]: Could not receive message from CGF-ScheduleManager. This may be caused by CGF-ScheduleManager having crashed before its call to 'stmScheduleManagerInit()' - please check its health.
av/stm/runtime/src/master/main.c:690 assertion failure, errno=110 (Connection timed out)
The schedule manager shows this and gets stuck at waiting for client…
<15>1 2023-06-19T23:13:27.456838Z - schedule_manager 17256 - - [0us][VERBOSE][tid:1][SocketClientServer.cpp:235][NO_TAG] SocketServer(@port:40100): accepted 127.0.0.1:46973
<15>1 2023-06-19T23:13:27.456877Z - schedule_manager 17256 - - [0us][VERBOSE][tid:1][ChannelSocket.hpp:613][ChannelSocketProducer] Connection for port 40100 accepted; status=DW_SUCCESS(0)
<15>1 2023-06-19T23:13:27.456912Z - schedule_manager 17256 - - [0us][VERBOSE][tid:1][ChannelSocket.hpp:658][ChannelSocketProducer] Connection for port 40100 sent metadata
<13>1 2023-06-19T23:13:27.462672Z - schedule_manager 17256 - - [0us][DEBUG][tid:0][ScheduleManager.cpp:123][ScheduleManager] waiting for the clients to connect..
<13>1 2023-06-19T23:13:27.472761Z - schedule_manager 17256 - - [0us][DEBUG][tid:0][ScheduleManager.cpp:123][ScheduleManager] waiting for the clients to connect..
<13>1 2023-06-19T23:13:27.482950Z - schedule_manager 17256 - - [0us][DEBUG][tid:0][ScheduleManager.cpp:123][ScheduleManager] waiting for the clients to connect..
and looking at the node process (jack_io_master)
<14>1 2023-06-19T23:13:27.474577Z - jackio_master 17255 - - [1687216407474573us][INFO][tid:scheduleManagerReceiver][ScheduleManagerReceiver.cpp:169][Receiver] [Receiver] connected
<13>1 2023-06-19T23:13:27.474632Z - jackio_master 17255 - - [1687216407474632us][DEBUG][tid:scheduleManagerReceiver][ScheduleManagerReceiver.cpp:87][Receiver] [Sender] semaphore name: /cgf_schedulemanager_semaphore
<12>1 2023-06-19T23:13:44.158081Z - jackio_master 17255 - - [1687216424158074us][WARN][tid:38][HealthService.cpp:222][EPL_Interface] Connection with a SEH x86 Client timed out after 15000 ms. Please check if a SEH x86 Client is launched.
<13>1 2023-06-19T23:13:44.158146Z - jackio_master 17255 - - [1687216424158146us][DEBUG][tid:38][ChannelConnectorImpl.cpp:103][ChannelConnector] ChannelConnector: thread 140717380853760 stopping producer and consumer connect threads
<12>1 2023-06-19T23:13:44.180478Z - jackio_master 17255 - - [1687216424180475us][WARN][tid:parameterServiceLifeCycle][ParameterServerImpl.cpp:335][DynamicParameterServer] Connection with a Dynamic parameter Client timed out after 15000 ms. Please check if a Dynamic Parameter Client is launched.
<13>1 2023-06-19T23:13:44.180489Z - jackio_master 17255 - - [1687216424180489us][DEBUG][tid:parameterServiceLifeCycle][ChannelConnectorImpl.cpp:103][ChannelConnector] ChannelConnector: thread 140716407775232 stopping producer and consumer connect threads
<12>1 2023-06-19T23:13:44.341393Z - jackio_master 17255 - - [1687216424341386us][WARN][tid:38][TopExecutor.hpp:3775][TopExecutor] Failed to connect x86 SEH communication with error DW_TIME_OUT. Please ignore this error message if SEH x86 is not launched with RoadRunner.
<12>1 2023-06-19T23:13:44.477957Z - jackio_master 17255 - - [1687216424477949us][WARN][tid:parameterServiceLifeCycle][TopExecutor.hpp:3749][TopExecutor] Failed to start Dynamic Parameter Service with error DW_TIME_OUT. Please ignore this error message if a Dynamic Parameter Client is not launched with RoadRunner.
<11>1 2023-06-19T23:13:54.649296Z - jackio_master 17255 - - [1687216434649289us][ERROR][tid:rr2_main][TopExecutor.hpp:539][TopExecutor] Caught signal 15 sent by pid 17253
<13>1 2023-06-19T23:13:54.649373Z - jackio_master 17255 - - [1687216434649373us][DEBUG][tid:rr2_main][TopExecutor.hpp:563][TopExecutor] Caught user interruption signal
<13>1 2023-06-19T23:13:54.649390Z - jackio_master 17255 - - [1687216434649390us][DEBUG][tid:rr2_main][TopExecutor.hpp:2407][TopExecutor] TopExecutor: main thread informs stm server to exit schedule
<13>1 2023-06-19T23:13:54.724322Z - jackio_master 17255 - - [1687216434724312us][DEBUG][tid:scheduleManagerReceiver][ScheduleManagerReceiver.cpp:206][Receiver] [Receiver] done exiting!
Meanwhile I will look into the 5.12 documentation. I am currently running the DriveOS 6.0.6 docker container.
Yes, that is the tutorial I followed to integrate my custom node into an app.
Could you give some pointers to what the error logs mean for jack_io_master or what could be the possible root causes the schedule manager is waiting for a client to connect even though the 1st line of the jack_io_master logs show that the schedule manager is connected to this client. Thanks.
Also, looking at 5.12 documentation, the command to run the sample application is ’ sudo /usr/local/driveworks/bin/run_cgf_demo.sh but in driveos 6.0.6 this script does not exist. The script given in driveos 6.0.6 for demo pipeline does not work inside the docker container. nvidia-smi shows the GPU and cuda, and the container has access to the display. I haven’t made a custom dockerfile, i’m directly using the container from NGC.
Dear @stefan65,
We have not tested CGF Demo in docker. Could you check the steps in presentation as reference to run the sample on target and give a try and share complete logs in case issues? CGF-presentation.pdf (1.2 MB)
The PDF seems helpful, can you provide the HelloWorld.app.json that is used in the tutorial or the source code in general so that I can run it. Most of it is covered in the PDF except creation of HelloWorld.app.json. Thanks.
Dear @stefan65,
The PDF has steps to integrate custom node. You can follow the same steps(as DW 5.10 documentation has some doc issues) and give a try. Let us know if you notice any issue.
Dual DriveOrins means two DriveOrinDevkit chained with PCIe.
Can cgf running over two DriveOrinDevkit with inter-machine communication with DriveSDK 6.0.6?
I am curious about the usage of ROS2 in the repository you sent. Do we really need ROS2 as a middleware to run a complete workflow for a robotic system. Do we have tools from Nvidia that does what ROS2 does?
for example launching (what roslaunch does) and running a complete state machine or behavior tree of a robot