I am trying to use FFMPEG hardware input with the nvenc encoder. I think I have the FFMPEG stuff setup correctly, however I do not think I am copying the texture into the cuda memory correctly.
I have setup/alloc-ed memory for the cuda buffer, registered and mapped my texture to a mapped array, and now I am trying to cuMemcpy my texture (OGL) to cuda for nvenc input.
Here is the setup code for the resource:
CUresult res;
CUcontext oldCtx;
m_inputTexture = texture;
res = cuCtxPopCurrent(&oldCtx); // THIS IS ALLOWED TO FAIL
res = cuCtxPushCurrent(*m_cuContext);
res = cuGraphicsGLRegisterImage(&cuInpTexRes, m_inputTexture, GL_TEXTURE_2D, CU_GRAPHICS_REGISTER_FLAGS_READ_ONLY);
res = cuCtxPopCurrent(&oldCtx); // THIS IS ALLOWED TO FAIL
Here is my alloc:
CUresult cuRes;
CUcontext oldCtx;
cuRes = cuCtxPopCurrent(&oldCtx); // THIS IS ALLOWED TO FAIL
cuRes = cuCtxPushCurrent(*m_cuContext);
cuRes = cuMemAllocPitch(&cuDevPtr, &cuMemPitch, 4 * width, height, 16); //4, 8, 16
cuRes = cuCtxPopCurrent(&oldCtx); // THIS IS ALLOWED TO FAIL
Then in my encode_frame() function, this is ran before sending the frame to ffmpeg:
//Perform cuda mem copy for input buffer
CUresult cuRes;
NVENCSTATUS encStat;
CUarray mappedArray;
CUcontext oldCtx;
cuRes = cuCtxPopCurrent(&oldCtx); // THIS IS ALLOWED TO FAIL
cuRes = cuCtxPushCurrent(*m_cuContext);
cuRes = cuGraphicsResourceSetMapFlags(cuInpTexRes, CU_GRAPHICS_MAP_RESOURCE_FLAGS_READ_ONLY);
cuRes = cuGraphicsMapResources(1, &cuInpTexRes, 0);
cuRes = cuGraphicsSubResourceGetMappedArray(&mappedArray, cuInpTexRes, 0, 0);
//Cuda unmap
cuRes = cuGraphicsUnmapResources(1, &cuInpTexRes, 0); //Ive tried having this line here and after my avcodec_send_frame() call
cuRes = cuMemcpyAtoD(cuDevPtr, mappedArray, 0, width * 4 * height); //FIXME: This fails with invalid value error, for some reason width *4 works, but width * 4 +1 doesnt....it should be width * height * 4, however
cuRes = cuCtxPopCurrent(&oldCtx); // THIS IS ALLOWED TO FAIL
//...
ret = avcodec_send_frame(c, rgb_frame);
The tests I ran had width and height both at 256. EDIT: 512.
The CUcontext I use (m_cuContext) was generated by FFMPEG in my initialization phase before any of the above code.
When I run the line
cuRes = cuMemcpyAtoD(cuDevPtr, mappedArray, 0, width * 4 * height);
I get a return of CUDA_ERROR_INVALID_VALUE, and i cannot seem to figure out why.
One thing that seems curious to me is the cuDevPtr that I get (always) from the alloc call seems to always be a large value something like: 51569229824…this seems suspect.
So it seems the most I am able to cuMemcpyAtoD is 2048 bytes (512 * 4)