NVENC and Synchronization

One thing that is missing in the documentation is what the synchronization rules are between a CUDA buffer and NVENC. In my case I do this:
thread 1:
renderToTextureInGL()
mapTextureAsCudaArray()
cudaKernelToConvertRGBATextureToNV12CUDABuffer(texture, cudaBuffer)
queueCUDABufferForEncode(cudaBuffer)

thread 2:
waitForCUDABufferToBeQueued()
nvEncMapInputResource(cudaBuffer)
nvEncEncodePicture(cudaBuffer)

I do this because the NVENC calls take > 1 ms.

When initializing NVENC I give it the same CUcontext that is being used in thread 1. As far as I can tell I do need extra synchronization here. Is that correct? Which synchronization method should I use? It seems like I should use a cudaEvent_t, however I can’t specify a stream for NVENC, which means I need to call cudaStreamWaitEvent(NULL, event), but that seems less than ideal.

Thoughts?

Here we have the same question.

We have been using nvenc in our product for a while, but on the new version we are working on, the program hangs in the call nvEncMapInputResource, at random times (every time happens after a different number of frames).

It seem’s like a race condition, and that makes us wonder if we properly understood the NVENC documentation.

In the documentation about nvEncMapInputResource, it says “This function provides synchronization guarantee
that any direct3d or cuda work submitted on the input buffer is completed before the buffer is used for encoding”.

Questions:

1 “work” does include asynchoronous memory transfers?
2 Does it synchronize with any kernel or memory transfer, when using streams?

Thanks

This might be interesting for someone.

We found the source of the deadlock we had on nvEncMapInputResource.

We were doing some indirect cuda calls from a cuda callback. This callback was starting a thread that was executing some cuda event related calls. It seem’s it was creating a deadlock in the cuda runtime when calling nvEncMapInputResource.

Removing this callback made everything work again. Althoug the deadlock was happening only on certain systems and at randome times, it is explained in the documetantion that cuda callbacks should not call directly or indirectly any other cuda api call, or a deadlock may happen.

Regarding the nvEncMapInputResource, we still would like to have an answer about the questions in the previous post.

Thanks