cuMemcpy3D returns CUDA_ERROR_INVALID_VALUE

Hello, what would cause CUDA to return CUDA_ERROR_INVALID_VALUE from cuMemcpy3D? My code is based on the OptiX advanced samples, and the block in question looks like the below.

I’ve checked the values in the struct in the debugger, and all look correct. The source pointer is good, and everything apart from the values I’m setting in CUDA_MEMCPY_3D is 0.

if INTEROP_TEXTURE is 0, then the surrounding code copies the rendering buffer back to the host, then uploads to texture with glTexImage2D. That path works as expected, it’s just slow. As far as I can tell, I’ve done all the necessary steps here so not sure if there’s some initialization/gl state problem, or some call I’ve missed but I’ve scoured the example and the docs and can’t see what that might be.

This in on Windows with 535.98 drivers.

// --- before the rendering loop, after the texture is created
// register texture with cuda
#if INTEROP_TEXTURE
CUgraphicsResource cuda_graphics_resource;
CU_CHECK(
  cuGraphicsGLRegisterImage(&cuda_graphics_resource,
                            texture,
                            GL_TEXTURE_2D,
                            CU_GRAPHICS_REGISTER_FLAGS_WRITE_DISCARD));
#endif

// --- in the rendering loop
#if INTEROP_TEXTURE
CU_CHECK(cuGraphicsMapResources(
  1,
  &cuda_graphics_resource,
  renderer.device().cuda_stream())); 

CUarray dst_array = nullptr;
CU_CHECK(cuGraphicsSubResourceGetMappedArray(&dst_array,
                                             cuda_graphics_resource,
                                             0 /*array index*/,
                                             0 /*mip level*/));

CUDA_MEMCPY3D params = {};
params.srcMemoryType = CU_MEMORYTYPE_DEVICE;
params.srcDevice = renderer.color_buffer().device_ptr();
params.srcPitch =
  renderer.launch_params().fb_width * sizeof(f32) * 4;
params.srcHeight = renderer.launch_params().fb_height;

params.dstMemoryType = CU_MEMORYTYPE_ARRAY;
params.dstArray = dst_array;
params.WidthInBytes =
  renderer.launch_params().fb_width * sizeof(f32) * 4;
params.Height = renderer.launch_params().fb_height;
params.Depth = 1;

CU_CHECK(cuMemcpy3D(&params)); //< THIS CALL FAILS

CU_CHECK(cuGraphicsUnmapResources(
  1,
  &cuda_graphics_resource,
  renderer.device().cuda_stream()));
#endif