Problems about DRIVE PX 2 camera interface

1.When reading data from more than 6 cameras (RGBA image), it should cost about 1.8GB/s bandwidth (i.e. 192012084*(1000/30)*6). Is it the bottleneck of bandwidth? If not, what is the maximum supported bandwidth?
In practice, we observed a delay when reading data from camera if there is a memory copy operation between host and device. This kind of delay causes that the camera reading speed could not meet our requirement (e.g. 30 FPS).
We tried to solve the above problem via “stream” and “cudaSetDevice”. However, neither of them works. Do you have any idea about this,
e.g. improve the parallel performance?

For example, there are two threads (A and B)
Thread A reads camera as following:

dwImageCUDA* img_cuda = nullptr;
result = dwSensorCamera_readFrame(&frame_handle, sibling, 300000, camera.sensor);
result = dwSensorCamera_getImageNvMedia(&frame_nvm_yuv,DW_CAMERA_PROCESSED_IMAGE, frame_handle);  
result = dwImageFormatConverter_copyConvertNvMedia(frameNVMrgba,
        frame_nvm_yuv, camera.yuv2rgba);
result = dwSensorCamera_returnFrame(&frame_handle);
result = dwImageStreamer_postNvMedia(frameNVMrgba, camera.streamer);
if (dwImageStreamer_receiveCUDA(&img_cuda , 60000, camera.streamer) != DW_SUCCESS)

Thread B copy memory between host and device as following:
(1)cudaMemcpy2DAsync and cudaStreamSynchronize

But either (1) or (2) will cause delay for reading image from camera. Is there any solution?

  1. How to compress the image asynchronously when reading image from cameras?
    We use NvMediaIJPEFeedFrame to compress image when reading cameras synchronously.
    But each compressing process consumes about 2ms.
    More important, this delay increases almost linearly as reading more cameras.
    For example, the total compressing time of image from 6 cameras is about 12ms. In this case, how to compress image asynchronously?

3.The cpu usage (i.e., load) raise to 150%-200% on single core when reading more threan 6 cameras, so how to reduce the cpu usage
when reading more then 6 cameras? to change the FPS parameter for a camera? We tried to set parameter “serialize-framerate=20” when calling “dwSAL_createSensor”, but it doesn’t work.

Dear liuli,
In CUDA, asynchronous data transfer happens when you copy from GPU to pinned Host memory. Please check with pinned memory buffers on CPU to boost data transfer speed.
As you are using ImageStreamers, consumer do not release the frames back to consumer until it is done with processing(performing compression) on it. Once the consumer releases it back to producer, it can not do any operations on that frame. Why you are looking for asynchrounous API to compress the data. The consumer anyways can’t release it back until it completes compressing.
Can you profile the application and share insights on why the cpu load is more? Any thread is in busy waiting loop?
Can you look at camera_gsml sample for changing frame rate parameter using dwSensorSerializer_initialize.