How to establish NvSciC2cPcie communication using buffers allocated to GPU

Please provide the following info (tick the boxes after creating this topic):
Software Version
DRIVE OS 6.0.6
DRIVE OS 6.0.5
DRIVE OS 6.0.4 (rev. 1)
DRIVE OS 6.0.4 SDK
other

Target Operating System
Linux
QNX
other

Hardware Platform
DRIVE AGX Orin Developer Kit (940-63710-0010-D00)
DRIVE AGX Orin Developer Kit (940-63710-0010-C00)
DRIVE AGX Orin Developer Kit (not sure its number)
other

SDK Manager Version
1.9.2.10884
other

Host Machine Version
native Ubuntu Linux 20.04 Host installed with SDK Manager
native Ubuntu Linux 20.04 Host installed with DRIVE OS Docker Containers
native Ubuntu Linux 18.04 Host installed with DRIVE OS Docker Containers
other

Hello nvidia expert:

I’m trying to establish NvSciC2cPcie communication, and stream buffers allocated to GPU between the producer and consumer on different SoCs.
In this case, it seems to fail to reconcile the buffers allocated to the GPU.
Is this use case feasible in DriveOS 6.0.6?
If it is feasible, I wonder if you could tell me how to implement it?

The following is how I tried it.

  1. The Linux kernel module was inserted and the PCIe was hot-plugged, referring to Chip to Chip Communication.

  2. The sample application (multi-process cuda/cuda stream with one consumer on another SoC) was run, referring to NvSciStream Performance Test Application.

    On chip s0:

    $ ./nvscistream_event_sample -P 0 nvscic2c_pcie_s0_c5_1 -Q 0 f
    

    On chip s1

    $ ./nvscistream_event_sample -C 0 nvscic2c_pcie_s0_c6_1 -F 0 3
    

    Then, “abbort” occurs in chip s0.
    This is due to the reason that NvSciBufAttrListReconcile() at line 154 in drive-linux/samples/nvsci/nvscistream/event/block_pool.c failed because NvSciCommonPanic occurred.
    I used default shipped sample application.

Please see if PCIe Hot-Plug not working has anything to do with this. Thanks.

Dear @VickNV ,

I could see that PCIe Hot-Plug works with the use case of streaming buffers allocated CPU between different SoCs.
I checked PCIe Hot-Plug not working, but it doesn’t seem to be related to it.
It seems to make a difference whether it is CPU or GPU that allocates to the buffer.

How to check it is described below.
On chip s0:

./test_nvscistream_perf -P 0 nvscic2c_pcie_s0_c5_1 -l -b 12.5 -f 10

On chip s1:

./test_nvscistream_perf -C 0 nvscic2c_pcie_s0_c6_1 -l -b 12.5 -f

Thanks.

Please clarify which application you’re using? nvscistream_event_sample or test_nvscistream_perf? Please share the commands you executes with the full outputs of them. Thanks.

Dear @VickNV ,

There is an error in my above. I am so sorry it’s confusing.

I would like to work nvscistream_event_sample, described on NvSciStream Sample Application .
The commands executed with full output are as follows.

On chip s0:

$ ./nvscistream_event_sample -P 0 nvscic2c_pcie_s0_c5_1 -Q 0 f
Aborted (core dumped)
$ gdb nvscistream_event_sample core.nvscistream_eve.2775 
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "aarch64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from nvscistream_event_sample...
(No debugging symbols found in nvscistream_event_sample)

warning: core file may not match specified executable file.
[New LWP 2775]
[New LWP 2776]
[New LWP 2777]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
Core was generated by `./nvscistream_event_sample -P 0 nvscic2c_pcie_s0_c5_1 -Q 0 f'.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (Thread 0xffff94bcc900 (LWP 2775))]
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x0000ffff94e5caac in __GI_abort () at abort.c:79
#2  0x0000ffff96baa768 in NvSciCommonPanic () from /usr/lib/libnvscicommon.so.1
#3  0x0000ffff96c5e420 in ?? () from /usr/lib/libnvscibuf.so.1
#4  0x0000ffff96c41580 in ?? () from /usr/lib/libnvscibuf.so.1
#5  0x0000ffff96c42010 in ?? () from /usr/lib/libnvscibuf.so.1
#6  0x0000ffff96c4ad88 in ?? () from /usr/lib/libnvscibuf.so.1
#7  0x0000ffff96c4bb04 in NvSciBufAttrListReconcile () from /usr/lib/libnvscibuf.so.1
#8  0x000000000040825c in handlePool ()
#9  0x0000000000404438 in eventServiceLoop ()
#10 0x0000000000403e58 in main ()

On chip s1:

$ ./nvscistream_event_sample -C 0 nvscic2c_pcie_s0_c6_1 -F 0 3

Regarding PCIe Hot-Plug not working you presented, I checked that Hot-Plug is working by seeing that test_nvscistream_perf (described in NvSciStream Performance Test Application) can work.

Thanks.

I’ll check this with our team. While we investigate, could you please confirm whether you are using the cable resolving PCIe Hot-Plug not working previously created by your colleague?

Dear @VickNV ,
I checked with my colleague who previously cretated PCIe hotplug doesn’t work and the cable resolving it is certainly being used.

We assume the abort error is due to the same SoC IDs on both sides.

Have you followed that topic to specify different SoC IDs for the two devkits? If not, I would suggest you work with @shibata-a to set up the environment first. Thanks.

Dear @VickNV ,
We tried respecifying different SoC IDs for the two devkits.
However, we got the same result as above.

To obtain the SoC IDs of both devkits, please execute the command on each of them and share the results. Thanks.

root@tegra-ubuntu:/home/nvidia# xxd -b /proc/device-tree/soc_id

Hi @kizaki, any update on this? Thanks.

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks