cuMemcpyAtoD returns invalid_value error

I am trying to use FFMPEG hardware input with the nvenc encoder. I think I have the FFMPEG stuff setup correctly, however I do not think I am copying the texture into the cuda memory correctly.

I have setup/alloc-ed memory for the cuda buffer, registered and mapped my texture to a mapped array, and now I am trying to cuMemcpy my texture (OGL) to cuda for nvenc input.

Here is the setup code for the resource:

CUresult res;
		CUcontext oldCtx;
		m_inputTexture = texture;
		res = cuCtxPopCurrent(&oldCtx); // THIS IS ALLOWED TO FAIL
		res = cuCtxPushCurrent(*m_cuContext);
		res = cuGraphicsGLRegisterImage(&cuInpTexRes, m_inputTexture, GL_TEXTURE_2D, CU_GRAPHICS_REGISTER_FLAGS_READ_ONLY);
		res = cuCtxPopCurrent(&oldCtx); // THIS IS ALLOWED TO FAIL

Here is my alloc:

CUresult cuRes;
		CUcontext oldCtx;
		cuRes = cuCtxPopCurrent(&oldCtx); // THIS IS ALLOWED TO FAIL
		cuRes = cuCtxPushCurrent(*m_cuContext);
		cuRes = cuMemAllocPitch(&cuDevPtr, &cuMemPitch, 4 * width, height, 16); //4, 8, 16
		cuRes = cuCtxPopCurrent(&oldCtx); // THIS IS ALLOWED TO FAIL

Then in my encode_frame() function, this is ran before sending the frame to ffmpeg:

//Perform cuda mem copy for input buffer
	CUresult cuRes;
	NVENCSTATUS encStat;
	CUarray mappedArray;
	CUcontext oldCtx;

	cuRes = cuCtxPopCurrent(&oldCtx); // THIS IS ALLOWED TO FAIL
	cuRes = cuCtxPushCurrent(*m_cuContext);
	cuRes = cuGraphicsResourceSetMapFlags(cuInpTexRes, CU_GRAPHICS_MAP_RESOURCE_FLAGS_READ_ONLY);
	cuRes = cuGraphicsMapResources(1, &cuInpTexRes, 0);
	cuRes = cuGraphicsSubResourceGetMappedArray(&mappedArray, cuInpTexRes, 0, 0);
	//Cuda unmap
	cuRes = cuGraphicsUnmapResources(1, &cuInpTexRes, 0); //Ive tried having this line here and after my avcodec_send_frame() call

	cuRes = cuMemcpyAtoD(cuDevPtr, mappedArray, 0, width * 4 * height); //FIXME: This fails with invalid value error, for some reason width *4 works, but width * 4 +1 doesnt....it should be width * height * 4, however
	cuRes = cuCtxPopCurrent(&oldCtx); // THIS IS ALLOWED TO FAIL

        //...

        ret = avcodec_send_frame(c, rgb_frame);

The tests I ran had width and height both at 256. EDIT: 512.
The CUcontext I use (m_cuContext) was generated by FFMPEG in my initialization phase before any of the above code.

When I run the line

cuRes = cuMemcpyAtoD(cuDevPtr, mappedArray, 0, width * 4 * height);

I get a return of CUDA_ERROR_INVALID_VALUE, and i cannot seem to figure out why.

One thing that seems curious to me is the cuDevPtr that I get (always) from the alloc call seems to always be a large value something like: 51569229824…this seems suspect.

So it seems the most I am able to cuMemcpyAtoD is 2048 bytes (512 * 4)

Hi Ian,

Copying from a cuda array to a pitched device memory segment needs cuMemcpy2D instead of of cuMemcpyAtoD (as cuMemcpyAtoD does not know about the pitch). Thus you are able to copy one line with AtoD but it will fail as soon as you try to copy more than that.

Best regards
Stefan

For reference:
https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g27f885b30c34cc20a663a671dbf6fc27

If srcMemoryType is CU_MEMORYTYPE_ARRAY, srcArray specifies the handle of the source data. srcHost, srcDevice and srcPitch are ignored.
If dstMemoryType is CU_MEMORYTYPE_DEVICE, dstDevice and dstPitch specify the (device) base address of the destination data and the bytes per row to apply. dstArray is ignored.