I run the deviceQuery on the orin agx evaluation board and got the following:
“Concurrent copy and kernel execution: Yes with 2 copy engine(s)”
Is the meaning of a copy engine is DMA engine for one side copy (Host to Device/ Device to Host) or each engine contains 2 DMAs? Can memory operation be concurrent?
Thank you for the answer.
I understand from that post that concurrent and sequential copies “from device to host” have the same performance, right?
Just for me to fully understand what does 2 copy engine means? if fact 2 such copies can occur concurrently ?
copy engines are hardware in the GPU (DMA engines) that supervise data transfer between host and device (or between devices in non-Jetson settings).
A GPU having two or more copy engines means that it can support the use of a host->device transfer and a device<-host transfer simultaneously (and, for that matter, concurrent with a kernel execution and other host CPU activity).
A GPU with only 1 copy engine can run a host->device transfer, or a device->host transfer, but not both simultaneously.
Each copy engine contains one DMA engine. In order to have both host->device and device->host running at the same time, will require the use of 2 copy engines.
Thank you Robert. In case I have 2 copy engines, can two separated copy operations, both of type “host to device”, from different streams occur in parallel?
Generally I wouldn’t expect that. This is what was covered in the linked article. The general behavior I would expect is that one transfer would take place, followed by the other. This isn’t a function of the copy engines per se, but the behavior of the link connecting the GPU to the CPU. Think of it as a water pipe, with two tanks to drain. If you drain one tank, then the other, it is not really better or different than if you try to drain both tanks at the same time. So as a practical matter we don’t specify what will happen precisely. They might appear to be running “in parallel” depending on how you look at it, but the general behavior I expect to see is one transfer followed by the other.
You can run a profiler to see how the profiler depicts it, if you like.