Decoding a Bitstream using nvcuvid

Christoph1 · November 26, 2013, 7:05pm

Hi,

I’m currently working on my Bachelor-Thesis and therefor try to implement a Remote Renderer.
So an OpelGL picture is processed on a Server and then encoded via nvcuvenc. The encoded Frames are sent to a Client and at the current state I’m saving them client-sided into a file. The files are decodeable with the Nvidia Decoding sample. The Problem that I am currently facing is, that I don’t know how to decode the bitstream without writing it into a file.
The Decoding sample uses a Wrapper Class for the VideoSource, which needs a file as input. I know from recent posts, that it should be possible to pass the VideoStream directly to the VideoParser Class, but i don’t know where and how to do this.

I’m fairly new to the whole encoding and decoding thing, so help would be much appreciated.

Greetings from Germany, Christoph

Christoph1 · December 5, 2013, 8:13pm

I solved this particular problem by creating a parser with cuvidCreateVideoParser() and feeding the VideoData in each iteration by wrapping it with a CUVIDSOURCEDATAPACKET.

The application is working with some of the wrapper classes given by the Nvidia decoding sample. The new problem I’m facing is the fail of cuGLMapBufferObject at the 2nd frame.

While the mapping of the first decoded frame succeeds, the pitch of the 2nd frame is 0 and the assertion fails.The call is as follows:

void
ImageGL::map(CUdeviceptr *pImageData, size_t *pImagePitch, int field_num)
{
    checkCudaErrors(cuGLMapBufferObject(pImageData, pImagePitch, gl_pbo_[field_num]));
    assert(0 != *pImagePitch);
}

Im initializing the Cuda and OpenGl resources in this way:

void initGL(int argc, char** argv)
{
	int deviceCount = 0;
	char deviceName [256];

	cuDeviceGetCount(&deviceCount);
	for(int i = 0; i < deviceCount; i++)
	{
		cuDeviceGet(&m_device, i);
		cuDeviceGetName(deviceName, 256, m_device);
	}

	printf("Device %d: %s is used!", m_device, deviceName);



	glutInit(&argc, argv);
	glutInitDisplayMode(GLUT_RGBA | GLUT_DOUBLE);
	glutInitWindowSize(m_width, m_height);
	glutCreateWindow("Decoder!");
	initCallbacks();
	glewInit();

	cuGLCtxCreate(&m_ctx, CU_CTX_BLOCKING_SYNC, m_device);
	cuvidCtxLockCreate(&m_lock, m_ctx);


	const char* exec_path = "C:\ProgramData\NVIDIA Corporation\CUDA Samples\v5.5\3_Imaging\cudaDecodeGL";
	try{
		m_cudaModule = new CUmoduleManager("NV12ToARGB_drvapi_x64.ptx", NULL, 2, 2, 2);
	}
	catch (const char* error)
	{
		printf("\n>> CUmoduleManager::Exception!  %s not found!\n", error);
		printf(">> Please rebuild NV12ToARGB_drvapi.cu or re-install this sample.\n");
	}
	m_cudaModule->GetCudaFunction("NV12ToARGB_drvapi", &m_kernelNV12toARGB);
	m_cudaModule->GetCudaFunction("Passthru_drvapi", &m_kernelPassThrough);
}

int main(int argc, char** argv)
{
	cuInit(0);
	initGL(argc, argv);

	std::auto_ptr<FrameQueue> tmp_queue(new FrameQueue);
	m_queue = tmp_queue.release();

	m_decoder = new Decoder(m_ctx, m_lock, m_queue, m_width, m_height);
	m_decoder->initParser();
	initGLTexture(m_decoder->getDecoderParams().ulTargetWidth, m_decoder->getDecoderParams().ulTargetHeight);

// ...
}

The call for copying the current frame to texture is like this:

void copyFrameToTexture(CUVIDPARSERDISPINFO frame)
{
	CCtxAutoLock lck(m_lock);
	CUresult res = cuCtxPushCurrent(m_ctx);
	CUdeviceptr decodedFrame[2] = { 0, 0};
	CUdeviceptr interopFrame[2] = { 0, 0};

	int numFields = (frame.progressive_frame? (1) : (2+frame.repeat_first_field));

	for(int active = 0; active < numFields; active++)
	{
		CUVIDPROCPARAMS procParams;
		memset(&procParams, 0, sizeof(CUVIDPROCPARAMS));

		procParams.progressive_frame = frame.progressive_frame;
		procParams.second_field = active;
		procParams.top_field_first = frame.top_field_first;
		procParams.unpaired_field = (numFields == 1);

		unsigned int width = 0, height = 0, decodedPitch = 0;

		m_decoder->mapFrame(frame.picture_index, &decodedFrame[active], &decodedPitch, &procParams);

		width = m_decoder->getDecoderParams().ulTargetWidth;
		height = m_decoder->getDecoderParams().ulTargetHeight;

		size_t framePitch = 0;

		m_image->map(&interopFrame[active], &framePitch, active);
		framePitch = m_width*4;

		cudaPostProcessFrame(&decodedFrame[active], decodedPitch, &interopFrame[active], framePitch, m_cudaModule->getModule(), m_kernelNV12toARGB, NULL);

		m_image->unmap(active);
		m_decoder->unmapFrame(decodedFrame[active]);

		m_queue->releaseFrame(&frame);

	}

	checkCudaErrors(cuCtxPopCurrent(NULL));
}

So as mentioned before, the m_image->map() function fails at the 2nd call. Are there any conclusions why this happens? Help is much appreciated, greetings.