I ran ./deviceQuery and noticed that this GPU has 3 copy engine, wondering if there is a way to know the direction of these copy engines other than do coding tests? Thanks!
What would you want to know it for?
I would like to know this for illustrative purpose and see whether I can further overlap communication with computation if there are more than two copy engines in one direction.
Overlapping two copies over the same bus/link in the same direction provides no benefit. The extra copy engine is there to facilitate copies over NVLink.
Since NVLink (at least on non-POWER hardware) connects GPUs with GPUs, I don’t know whether the copy engine on the reading or the writing side of the transfer is used. This knowledge is also not really relevant for using it. General considerations however suggest the copy engine to be better placed on the writing side.