GTX980 NVENC : Subframe readback support

AdityaR · July 18, 2016, 5:16am

I’m using NVENC (API 6.0) on GTX980 (Windows10 + latest drivers). Came across encoder capability named NV_ENC_CAPS_SUPPORT_SUBFRAME_READBACK and found this is not supported. I assume this capability indicates async readback of encoded slice data in slice mode.

Any idea on if this capability supported on any of existing or upcoming graphics cards?
Or feature will get enabled with future driver updates?

roman380 · January 17, 2019, 6:44pm

NV_ENC_CAPS_SUPPORT_SUBFRAME_READBACK is available on many cards, on NVIDIA GeForce GTX 980 Ti via NVENCAPI_VERSION 8.0 in particular.

You can find some details on the feature applicability here pages 23, 39.

jsxt7920 · August 11, 2020, 2:12am

Do NV_ENC_CAPS_SUPPORT_SUBFRAME_READBACK should be work under async mode?

roman380 · August 11, 2020, 6:05am

As far as I remember you need to poll anyway, the event will not get you notification on partial data availability.

jsxt7920 · August 11, 2020, 7:42am

Thank you. If set enableEncodeAsnc to 1, will encoder take encoded slice to outbut bitstream buffer even if the frame is not complete finished? Does nvenc support read encoded slice data when the frame is being encoding yet? I try many times but failed.

jsxt7920 · August 11, 2020, 7:50am

Does NV_ENC_CAPS_SUPPORT_SUBFRAME_READBACK mean slice level readback? Set enableSubFrameWrite to 1 mean when a slice of a frame is encoded, it will be immediately write to output buffer, so it can be read immediately even if other slices is still encoding?

jsxt7920 · August 11, 2020, 8:03am

In https://on-demand.gputechconf.com/gtc/2014/presentations/S4654-detailed-overview-nvenc-encoder-api.pdf
It said that Poll and read data till NV_ENC_LOCK_BITSTREAM::hwEncodeStatus = 2, what does it mean?
I find if I set enableEncodeAsync to 1, the NV_ENC_LOCK_BITSTREAM::numSlices is always 0.

roman380 · August 11, 2020, 8:05am

As far as I remember, partial “subframe” completion is essentially availability of complete NAL unit which does not yet make it a full frame. So, yes, it’s probably encoded slice. Again to my best knowledge the thing worked like this: NAL is added and running frame size size is updated as well. Then later when next NAL is available, it is incrementally appended to the buffer. And you can read it again since further updates will just add up without changing already added data. Once in a while the entire frame is completed.

jsxt7920 · August 11, 2020, 8:42am

Thank you, I will try it again.

jsxt7920 · August 12, 2020, 5:57am

Do you still remeber it should use async or sync mode? Should read sub frame in another thread?

roman380 · August 12, 2020, 6:01am

To my best knowledge you can still do both, but async event notification will just notify you on 100% frame completion. Along with that you can poll and see the frame being populating NAL by NAL. In sync mode you just poll in a similar way without having an event.

jsxt7920 · August 12, 2020, 6:03am

Then can we control how many sub frames a picture to encode? Is it set by slice mode? Or is it not relate to slice mode? I find nvidia samples use multiple input-output buffer? Should I change it to only one ?

roman380 · August 12, 2020, 6:13am

I think everything creating separate VCL NAL units is potentially enabling subframe readback. Enabled slice mode is the obvious option. I would expect enabled infra-refresh to result in similar behavior too.

With “traditional” one NAL per frame style, however, it is unlikely that subframe readback is helpful since (if) no incomplete NALs are reported.

jsxt7920 · August 12, 2020, 7:06am

In nvidia nvenc samples, a variable m_nOutputDelay is used. What does it mean? The encoder output encoded data to buffer m_iToSend%m_nEncoderBuffer.
picParams.outputBitstream = m_vBitstreamOutputBuffer[m_iToSend % m_nEncoderBuffer];

But when it read data, it is below:
unsigned i = 0;
int iEnd = bOutputDelay ? m_iToSend - m_nOutputDelay : m_iToSend;
for (; m_iGot < iEnd; m_iGot++)
{
WaitForCompletionEvent(m_iGot % m_nEncoderBuffer);
NV_ENC_LOCK_BITSTREAM lockBitstreamData = { NV_ENC_LOCK_BITSTREAM_VER };
lockBitstreamData.outputBitstream = vOutputBuffer[m_iGot % m_nEncoderBuffer];
lockBitstreamData.doNotWait = true;
NVENC_API_CALL(m_nvenc.nvEncLockBitstream(m_hEncoder, &lockBitstreamData));

	uint8_t *pData = (uint8_t *)lockBitstreamData.bitstreamBufferPtr;
	if (vPacket.size() < i + 1)
	{
		vPacket.push_back(std::vector<uint8_t>());
	}
	vPacket[i].clear();
	vPacket[i].insert(vPacket[i].end(), &pData[0], &pData[lockBitstreamData.bitstreamSizeInBytes]);
	i++;

	NVENC_API_CALL(m_nvenc.nvEncUnlockBitstream(m_hEncoder, lockBitstreamData.outputBitstream));

	if (m_vMappedInputBuffers[m_iGot % m_nEncoderBuffer])
	{
		NVENC_API_CALL(m_nvenc.nvEncUnmapInputResource(m_hEncoder, m_vMappedInputBuffers[m_iGot % m_nEncoderBuffer]));
		m_vMappedInputBuffers[m_iGot % m_nEncoderBuffer] = nullptr;
	}

	if (m_bMotionEstimationOnly && m_vMappedRefBuffers[m_iGot % m_nEncoderBuffer])
	{
		NVENC_API_CALL(m_nvenc.nvEncUnmapInputResource(m_hEncoder, m_vMappedRefBuffers[m_iGot % m_nEncoderBuffer]));
		m_vMappedRefBuffers[m_iGot % m_nEncoderBuffer] = nullptr;
	}

	int cnt = lockBitstreamData.numSlices;
}

Why it is not directly read from output buffer m_iToSend % m_nEncoderBuffer

roman380 · August 12, 2020, 7:22am

It looks like intentional delay, does not look related to subframe data

jsxt7920 · August 12, 2020, 7:25am

It said that Poll and read data till NV_ENC_LOCK_BITSTREAM::hwEncodeStatus = 2, what does it mean?
while (lockBitstreamData.hwEncodeStatus != 2)
{
uint8_t *pData = (uint8_t *)lockBitstreamData.bitstreamBufferPtr;
if (vPacket.size() < i + 1)
{
vPacket.push_back(std::vector<uint8_t>());
}
vPacket[i].clear();
vPacket[i].insert(vPacket[i].end(), &pData[0], &pData[lockBitstreamData.bitstreamSizeInBytes]);
i++;
}
I used it, but it becomes a dead loop.

roman380 · August 12, 2020, 7:28am

Don’t keep the buffer locked while looping.

jsxt7920 · August 12, 2020, 7:52am

Then when read output buffer, how do I know read how many bytes once?
uint8_t *pData = (uint8_t *)pEnc->m_vBitstreamOutputBuffer[pEnc->m_iToSend % pEnc->m_nEncoderBuffer];
like this, I get output buffer pointer, but I do not know how may bytes that I shoud read.
Is there a variable show the position?

roman380 · August 12, 2020, 7:55am

At some point (while buffer is not locked!)an update lands and fills buffer bytes, updates length and hwEncodeStatus. You poll for these changes.

jsxt7920 · August 12, 2020, 8:03am

std::vector<std::vector<uint8_t>> vPacket;
NV_ENC_LOCK_BITSTREAM lockBitstreamData = { NV_ENC_LOCK_BITSTREAM_VER };
lockBitstreamData.outputBitstream = pEnc->m_vBitstreamOutputBuffer[pEnc->m_iToSend % pEnc-_nEncoderBuffer];
lockBitstreamData.doNotWait = true;
int i = 0;
while(lockBitstreamData.hwEncodeStatus != 2) {
uint8_t *pData = (uint8_t *)lockBitstreamData.bitstreamBufferPtr;
if (vPacket.size() < i + 1)
{
vPacket.push_back(std::vector<uint8_t>());
}
vPacket[i].clear();
vPacket[i].insert(vPacket[i].end(), &pData[0], &pData[lockBitstreamData.bitstreamSizeInBytes]);
i++;
}

Is it right?

Topic		Replies	Views
Slice level read back of NVENC Video Processing & Optical Flow	5	1704	April 15, 2022
nvEncoder only encodes first 240 frames Other Tools	0	2319	August 13, 2014
NVencs Output Bitstream is not readable GPU-Accelerated Libraries	9	2985	August 2, 2016
cudaDecodeGL example error? CUDA Programming and Performance	7	2191	September 27, 2010
Encoding Video with NVCUVENC DEVICE_MEMORY_INPUT NVVE_DEVICE_MEMORY_INPUT CUDA Programming and Performance	35	6673	April 26, 2012
Error calling the NvEncCreateBitstreamBuffer() function in the NvEncode api Video Processing & Optical Flow	2	1326	October 30, 2019
NVEncodeFrame causes driver reset NVAPI	0	1273	October 24, 2013
nvEncodeAPI - H264 Encoding, resolving pitch/stride ahead of time Video Processing & Optical Flow	1	848	January 13, 2017
Slice encode/decode support Jetson TX1	6	2415	October 18, 2021
Usage of NvBuffer APIs Jetson TX1	36	15950	December 15, 2017

GTX980 NVENC : Subframe readback support

Related topics