I’m seeing some odd artifacts trying to do vulkan interop that look like uninitialized memory, the presence of which depend on the exact values I write in my cuda kernel. Sure I’m probably doing something wrong, but no idea what it might be. I’m on a laptop A5500, Ubuntu 22.04 and the behaviour is the same with R535, R780 and R580
I’m doing interop by ([0] indicates main thread, [1] indicates render thread):
- [0] Create image in Vulkan with TILING_LINEAR and importing to CUDA as a buffer
- [0] Create a timeline semaphore in Vulkan and importing to CUDA
- [0] Transition the image from UNDEFINED to GENERAL layout (submit and wait on queue completion)
- [1] Write the buffer using the kernel below (simple u/v gradient). note: kernel launch is not from the main thread. cuda context is created in the main thread and then cuCtxSetCurrent() in the render thread before any other cuda calls. All importing of the memory from vulkan happens on the main thread after cuCtxSetCurrent() is first called from the render thread.
- [1] cuSignalExternalSemaphoreAsync() the timeline semaphore with a value of 1
- [1] cuStreamSynchronize()
- [1] cuMemcpyDtoH() the buffer and write to exr (first image shown below).
- [0] create a transfer buffer, transition image from GENERAL to TRANSFER_SRC_OPTIMAL, vkCmdCopyImageToBuffer, vkMapMemory, write exr from resulting pointer. Queue waits on the semaphore to reach value 1 in step 5
- [0] transition image to SHADER_READ_ONLY_OPTIMAL, use it as a texture on a quad to draw and present to screen.
The image copied to host from CUDA always looks correct. The Vulkan image (either copied to a buffer and mapped, or visually inspected from the presentation) only looks correct when a value other than 0 is written to the fourth component of c. I think I caught it showing some corruption when writing 0.1 but have been unable to reproduce that since.
The kernel:
extern "C" __global__
void write_gradient(float4* base, int width, int height, size_t rowPitchBytes)
{
int x = blockIdx.x * blockDim.x + threadIdx.x;
int y = blockIdx.y * blockDim.y + threadIdx.y;
if (x >= width || y >= height)
return;
float fx = (float)x / (float)(width ? width : 1);
float fy = (float)y / (float)(height ? height : 1);
float4 c;
c.x = fx; // red = x gradient
c.y = fy; // green = y gradient
c.z = 0.0f;
c.w = 0.0f;
int idx = y * width + x;
base[idx] = c;
}
output from cuMemcpyDtoH when writing alpha either 0 or 1, and from vkCmdCopyImageToBuffer/present when alpha is 1:
output from vkCmdCopyImageToBuffer when alpha is 0:

