cudaArray/surface -> NvCVImage -> cudaArray/surface

I’m writing a plugin for a host application where I’m provided a cudaArray for incoming image data and another cudaArray (for which I have to make a surface) for outgoing image data.

What is the best way to get data to/from a NvVFX effect? I initially thought I could use NvCVImage_Init() to point to the incoming data (the cudaArray directly, trying to avoid a copy). It doesn’t seem to be able to read the data in cudaArray (as one would require), and I can’t use a surface since the type is wrong.

Now I’m copying from a cuda surface to a previously allocated NvCVImage running an effect. Then I copy the outgoing NvCVImage (previously allocated) back to another surface.

A couple of things:

  • I can’t seem to use any built-in cudaMemcpy functions, so I have to use a custom kernel for both copies. Is there something I’m missing?

  • Does the first copy really need to happen, or the last? Since all data is accessed and modified via CUDA, it seems I should be able to at least read the source data directly in the effect.

Are there any suggestions on how to properly read/write NvCVImage from/to cuda surfaces/cudaArrays?

Real-time performance is a big factor, so it would be ideal to avoid as much copying and overhead as possible