Transfer Data from CPU to GPU Memory - speedup by DMA with USB Camera?

Hi all,

I’ve got a question about the transfer time optimisation of numpy arrays. They should be transferred from the cpu to the gpu and vice versa.

Dtype is uint8. It takes about 0.1 microseconds per item. I’ve got monochrome images, sized 1440x1080. Means one image transfer (numpy array with 1.555.200 values) takes about 0,160 seconds.

Because the camera takes 227 frames per second, i get a massive jam in the queue. The transfer of one frame should not longer take then 0,003 seconds.

Is there a way to get the image data in the gpu area with DMA (Direct Memory Access)?

I run the setup with pypylon with a basler cam, connected via USB3.1 gen1. I’ve transferred the data with opencv and pytorch. Timings are about the same, no big difference.

Both libraries are self-compiled with cuda support. In use is the latest jetpack with python 3.9.6, opencv 4.5.3, PyTorch 1.9.0.

Thanks for your help in advance!



Could you share an example for your use case?
Do you read the camera input with GStreamer or default OpenCV?

We have tested your use case with a sample CPU->GPU memory copy.
The required time for an uint8 1440x1080 buffer is ~0.3ms, far faster than your use case.

Could you give it a try? main.cpp (669 Bytes)

We test this on a JetPack4.6 XavierNX environment.

$ sudo nvpmodel -m 0
$ sudo jetson_clocks
$ nvcc main.cpp -o test
$ ./test
CPU to GPU: 1327 micro second
CPU to GPU: 278 micro second
CPU to GPU: 307 micro second
CPU to GPU: 291 micro second
CPU to GPU: 287 micro second
CPU to GPU: 353 micro second
CPU to GPU: 280 micro second
CPU to GPU: 343 micro second
CPU to GPU: 318 micro second
CPU to GPU: 361 micro second