I’ve got a question about the transfer time optimisation of numpy arrays. They should be transferred from the cpu to the gpu and vice versa.
Dtype is uint8. It takes about 0.1 microseconds per item. I’ve got monochrome images, sized 1440x1080. Means one image transfer (numpy array with 1.555.200 values) takes about 0,160 seconds.
Because the camera takes 227 frames per second, i get a massive jam in the queue. The transfer of one frame should not longer take then 0,003 seconds.
Is there a way to get the image data in the gpu area with DMA (Direct Memory Access)?
I run the setup with pypylon with a basler cam, connected via USB3.1 gen1. I’ve transferred the data with opencv and pytorch. Timings are about the same, no big difference.
Both libraries are self-compiled with cuda support. In use is the latest jetpack with python 3.9.6, opencv 4.5.3, PyTorch 1.9.0.
Thanks for your help in advance!