PCIe RX throughput rises strangely when runing multiple streams in cuda graph

user122022 · September 15, 2022, 6:59am

when I launched a manually designed CUDAGraph which is composed of two branches (streams), PCIe RX throughput rising so quickly compared to that only included single stream. I was confused …

By the way, what does metrics PCIe Read Requests to BAF mean and how can I verify which system factors have an effect on PCIe Bandwidth

hwilper · September 20, 2022, 2:57pm

@liuyis can you please take a look at this issue for the cuda graph question

Meanwhile, what version of Nsys are you using?

liuyis · September 20, 2022, 3:22pm

@user122022 Could you share the report file for us to take a closer look?

Regarding to PCIe Read Request to BAF, the name is actually PCIe Read Requests to BAR1, and if you hover over the name, you can see the description CPU+Peer Reads from VRAM over PCIe

user122022 · September 21, 2022, 12:42am

2022.3.4