Realtime Video SuperResolution too slow

Hello, I’m trying to use the sdk to do some realtime video super-resolution but unfortunatly when trying to upscale a 1080p to 2x (1920x800 to 3840x1600), it takes between 39 to 45 ms to proceed on my 2080ti.

This mean between 22 to 25 fps this is too slow for realtime(targetting 24fps) since I still have to do some manipulation and basic stuff on the image before rendering.

I notice that most of the time is consumed while transfering the Buffer from the Device to the Host at the end of the process (using cudaMemcpy or cudaMemcpy2D) like if the copy was waiting for the NGX processing to finish before allowing to copy the buffer back to the host.

Can someone help me on this ? Thanks.

No-one in here, is NGX development dead ?

Hi Felix

Not sure what exactly your pipeline is but I am assuming it needs (and is absolutely necessary) a device to host copy. cudaMemcpy2D is blocking call(with respect to host) in your case and will be on the default(null) stream. So this will result in copy waiting for NGX processing to complete (As NGX processing happens on default stream for video super rez).
You could try using cudaMemcpyAsync on null stream for device to host copy (note that your host memory in this case should be pinned non-pageable memory. That is either allocated using cudaHostAlloc or registered using cudaHostRegister). You will of course need to do a stream synchronize before you access the memory on host.

Awesome thanks for the reply, so this explain the blocking behavior I see.

I will implement the async pipeline you are describing BUT this will not improve much the 45ms I see to fully process a 1080p frame right ?