Cross process problem in libargus

tegra_multimedia_api and argus samples show interop : argus->IEGLOutputStream->eglstream->Acquire frame
or argus->IEGLOutputStream->IFrameConsumer->Acquire frame
In cross process ,just as my other topic:Cross eglstream copy CUeglFrame failed
this is the cpu performance:producer 2.0,consumer 5.3
(just gpu to cpu copy cuMemcpy3D
consumer:>cuEGLStreamConsumerAcquireFrame->cuGraphicsResourceGetMappedEglFrame-> cuMemcpy3D ->cuEGLStreamConsumerReleaseFrame

if i need 3 comsuer(yolo detect/face detect /ros usage ) for the 1 camera argus producer,it will cost much(2.0+3*5.3=17.9),how can i deal with this problem on my cross process ? any zero-copy method?The yolo/face/ros usage are developed by different person on 3 separate processes.
IImageNativeBuffer dma-buf can Cross process processing?any zero-copy method

Have you check the cudaBayerDemosaic and cudaHistogram MMAPI sample.

cudaHistogram cpu【1.3% ~2.0%】,so the argus and eglstream cost little;gpu data process will not cost cpu;

en,cpu cost much for the GPU to CPU memcpy copy!so just for my project usecase,i need to avoid date copy (zero copy)or just copy once!
so how can i deal with my problems on the best performance? one process for argus to producer camera frame;one yolo detect frame consumer process,the second face detect frame consumer process,and the third ros frame consumer process.

Mainly data copy takes up CPU,
Is there a better way for cross process usecase? Is there a more appropriate technical route for one producer process and more consumer process?


Not sure if we understand your question correctly.

In /usr/src/jetson_multimedia_api/argus/samples/cudaHistogram.
After mounting the buffer with cuGraphicsResourceGetMappedEglFrame, you can get a GPU-accessible buffer pointer.

This indicates that you don’t need to copy the buffer from CPU to GPU.
Instead, just create a wrapper and use it as zero-copy data.

cuResult = cuGraphicsResourceGetMappedEglFrame(&cudaEGLFrame, cudaResource, 0, 0);
CUDA_RESOURCE_DESC cudaResourceDesc;
memset(&cudaResourceDesc, 0, sizeof(cudaResourceDesc));
cudaResourceDesc.resType = CU_RESOURCE_TYPE_ARRAY;
cudaResourceDesc.res.array.hArray = cudaEGLFrame.frame.pArray[0];
CUsurfObject cudaSurfObj = 0;
cuResult = cuSurfObjectCreate(&cudaSurfObj, &cudaResourceDesc);


cudaHistogram sample,the wrapper of CUDA_RESOURCE_DESC for “”,it is simple applications;how can i use the wrapper of CUDA_RESOURCE_DESC on complex applications?
Eg:the cudaHistogram process get the wapper of CUDA_RESOURCE_DESC
the yolo detect process need the frame date,the wapper data ;
the face detect process also need the frame date,the wapper data ;
the Ros Visual navigation process also need the frame date,the wapper data ;
Is there a more appropriate technical route for one producer process and more consumer process?the cross process application
the argus camera frame data need to shared by multiple processes applications with zero copy

In fact,if my project is interop,multiple thread,it is easy;just use NvBuffer to get dma-buf fd,and use NvBufferTransform to copy the dma-buf;
But on cross process applications,the dma-buf can not Cross-process sharing

how about it?
No interop ,no multithreading,no single process,no one consumer!
Cross process! one argus camera producer,multiple consumer(about 3 consumer)


Do you want an example for across process NvBuffer access.
Or an example that access the NvBuffer in multi-thread but the same process?


I think cross process NvBuffer can can solve this!

I think cross process NvBuffer can can solve this!
But no example,could you give me some advices?

The sample is under internal review. For implementation, you can check discussion in
How to share the buffer in process context?

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.