CUDA->Vulkan interop shows (uninitialized memory?) artifacts depending on the value written by CUDA

anderslanglands · December 29, 2025, 2:51am

I’m seeing some odd artifacts trying to do vulkan interop that look like uninitialized memory, the presence of which depend on the exact values I write in my cuda kernel. Sure I’m probably doing something wrong, but no idea what it might be. I’m on a laptop A5500, Ubuntu 22.04 and the behaviour is the same with R535, R780 and R580

I’m doing interop by ([0] indicates main thread, [1] indicates render thread):

[0] Create image in Vulkan with TILING_LINEAR and importing to CUDA as a buffer
[0] Create a timeline semaphore in Vulkan and importing to CUDA
[0] Transition the image from UNDEFINED to GENERAL layout (submit and wait on queue completion)
[1] Write the buffer using the kernel below (simple u/v gradient). note: kernel launch is not from the main thread. cuda context is created in the main thread and then cuCtxSetCurrent() in the render thread before any other cuda calls. All importing of the memory from vulkan happens on the main thread after cuCtxSetCurrent() is first called from the render thread.
[1] cuSignalExternalSemaphoreAsync() the timeline semaphore with a value of 1
[1] cuStreamSynchronize()
[1] cuMemcpyDtoH() the buffer and write to exr (first image shown below).
[0] create a transfer buffer, transition image from GENERAL to TRANSFER_SRC_OPTIMAL, vkCmdCopyImageToBuffer, vkMapMemory, write exr from resulting pointer. Queue waits on the semaphore to reach value 1 in step 5
[0] transition image to SHADER_READ_ONLY_OPTIMAL, use it as a texture on a quad to draw and present to screen.

The image copied to host from CUDA always looks correct. The Vulkan image (either copied to a buffer and mapped, or visually inspected from the presentation) only looks correct when a value other than 0 is written to the fourth component of c. I think I caught it showing some corruption when writing 0.1 but have been unable to reproduce that since.

The kernel:

extern "C" __global__
void write_gradient(float4* base, int width, int height, size_t rowPitchBytes)
{
    int x = blockIdx.x * blockDim.x + threadIdx.x;
    int y = blockIdx.y * blockDim.y + threadIdx.y;

    if (x >= width || y >= height)
        return;

    float fx = (float)x / (float)(width  ? width  : 1);
    float fy = (float)y / (float)(height ? height : 1);

    float4 c;
    c.x = fx;      // red = x gradient
    c.y = fy;      // green = y gradient
    c.z = 0.0f;
    c.w = 0.0f;    
    int idx = y * width + x;
    base[idx] = c;
}

output from cuMemcpyDtoH when writing alpha either 0 or 1, and from vkCmdCopyImageToBuffer/present when alpha is 1:

output from vkCmdCopyImageToBuffer when alpha is 0:

linzheng0428 · December 31, 2025, 2:19am

This does not indicate uninitialized memory, but undefined behavior due to missing or incorrect synchronization between CUDA and Vulkan.
Without explicit ownership transfer and memory visibility guarantees, the observed values are not meaningful.

anderslanglands · December 31, 2025, 5:25am

Thanks for the reply! Could you explain a bit more what you mean by ownership transfer? I’m just using an external (timeline) semaphore to signal to vulkan that cuda is done, which from the only example I’ve been able to find looks like all that is required. Is there another step I’m missing?

bjorn24 · January 2, 2026, 9:03am

I share memory between vulkan and CUDA and I followed the samples github repo. I use two external sempahores, not one, I have a vulkan to cuda semaphore where cuda waits for a signal and I have a cuda to vulkan semaphore that vulkan waits on.

     checkCudaErrors(cudaWaitExternalSemaphoresAsync(cuda_v2c_semaphore, &params, 1, cuda_stream));
    //... launch optix/kernels
    checkCudaErrors(cudaSignalExternalSemaphoresAsync(cuda_c2v_semaphore, &params, 1, cuda_stream));

on vulkan side I do the same. I am personally not using a timeline semaphore, but those are probably better. There are lot of example Vulkan CUDA interop examples in the CUDA samples repo

On the vulkan side I sync as normal. On the cuda side I also synchrononize my cuda stream prior to signaling the semaphore. I also have these pairs of semaphores per frame in flight, as well as per fif shared memory and per fif cuda streams.

Topic		Replies	Views
CUDA Vulkan VkImage Interop CUDA Programming and Performance cuda	3	2240	January 27, 2024
CUDA-Vulkan image interop broken on Windows CUDA Programming and Performance cuda , vulkan	3	189	February 2, 2026
Editing Vulkan Image in CUDA - external memory and interop problem CUDA Programming and Performance vulkan	0	983	April 12, 2021
Failed to import semaphore from Vulkan to Cuda CUDA Programming and Performance cuda , vulkan	0	659	June 3, 2021
Question about exporting CUDA memory into Vulkan CUDA Programming and Performance vulkan	0	560	December 25, 2023
CUDA_ERROR_INVALID_VALUE importing Vulkan VkImage CUDA Programming and Performance	4	1411	December 1, 2019
CUDA Vulkan interop CUDA Programming and Performance	2	3187	January 21, 2020
vulkanImageCUDA sample fails with cudaErrorUnknown CUDA Programming and Performance	0	60	August 2, 2024
CUDA-Vulkan buffer interop fails when buffer size <= 448KB CUDA Programming and Performance cuda , vulkan	1	1475	November 30, 2021
Vulkan and Cuda: Memory Sharing Vulkan	0	1731	October 24, 2016

CUDA->Vulkan interop shows (uninitialized memory?) artifacts depending on the value written by CUDA

Related topics