when I launched a manually designed CUDAGraph which is composed of two branches (streams), PCIe RX throughput rising so quickly compared to that only included single stream. I was confused …
By the way, what does metrics PCIe Read Requests to BAF mean and how can I verify which system factors have an effect on PCIe Bandwidth
@liuyis can you please take a look at this issue for the cuda graph question
Meanwhile, what version of Nsys are you using?
@user122022 Could you share the report file for us to take a closer look?
PCIe Read Request to BAF, the name is actually
PCIe Read Requests to BAR1, and if you hover over the name, you can see the description
CPU+Peer Reads from VRAM over PCIe