Optical Flow newb questions

I’m by no means an NVidia expert. I inherited some code to process incoming captured video frames, L/R stereo pairs, and all the video frames are stuffed into cv::cuda::GpuMats. Now, I need to calculate depth disparity maps from the L/R matched pairs. All of the cuda code that does image processing is referenced using my cv::cuda::Stream m_CudaStream. When I try creating the OF (optical flow) components at the start of my processing thread, it needs a CUcontext to pass to nvCreateOpticalFlowCuda. if I create this context by using cuDeviceGet(), followed by cuCtxCreate( &context, 0, cuDevice[0] ), it works find and OF initializes correctly, but all my cuda image processing code starts to fail. guessing that it because I created another context, after leaving the OF init routines, I called (just a guess), ::cuCtxPopCurrent( … ) and what do you know, my cuda code worked again. I don’t know what it even MEANS to swap in and out a cuda context.

let’s suppose I can only have one context going at a time… I use the OF context to initialize the OF stuff. then I pop that context and get back my old, default one. I do some image processing on the L/R buffers. Now it’s time to calculate the depth disparity map.

At this point, I have a pair of cv::cuda::GpuMats and I need to massage them into these very strange OFGPUBuffer things. My GpuMats are special - I’ve mapped them to DirectX textures. I don’t want to create the OFGPUBuffer and process those through my image processing routines through the cuda routines. I’m happier using GpuMats and at the end, performing a device-side-blit into the OFGPUBuffers, then calling the OF stuff to execute(). The trouble is, how do I perform that blit?

Obviously, the below code won’t work. but any suggestions how to make it work?
Keep in mind I will probably have to swap contexts again to get the OF stuff to execute. Can somebody help explain this to me?

CUarray Input0 = OFFunctions.nvOFGPUBufferGetCUarray( DepthBufferInHandle[0] );
CUarray Input1 = OFFunctions.nvOFGPUBufferGetCUarray( DepthBufferInHandle[1] );

cudaMemcpy2DArrayToArray( 
    (cudaArray_t) Input0, 
    0, 0, 
    (cudaArray_t) ProcessedBufferGpuMat[0].data, // a cv::cuda::GpuMat
    0, 0, 
    GetImageWidth( ), GetImageHeight( ) );

cudaMemcpy2DArrayToArray(
    (cudaArray_t) Input1,
    0, 0,
    (cudaArray_t) ProcessedBufferGpuMat[1].data,
    0, 0,
    GetImageWidth( ), GetImageHeight( ) );

NV_OF_EXECUTE_INPUT_PARAMS InputParams = { 0 };
NV_OF_EXECUTE_OUTPUT_PARAMS OutputParams = { 0 };
InputParams.inputFrame = DepthBufferInHandle[0];
InputParams.referenceFrame = DepthBufferInHandle[1];
OutputParams.outputBuffer = DepthBufferOutHandle;
NV_OF_STATUS status = OFFunctions.nvOFExecute( m_hOpticalFlowHandle, &InputParams, &OutputParams );