Cuda vs Vulkan performance difference

nikitablack · January 5, 2023, 8:15am

Hello,

I asked the same question in the Cuda forum but was redirected here. The original question has the Cuda kernel and here’s the GLSL shader for completeness:

void main() {
    uint row = gl_GlobalInvocationID.x;
    uint col = gl_GlobalInvocationID.y;

    uint offset = N * row;

    float result = 0.0f;

    for (uint s = 0; s < N; ++s)
    {
      result += a[offset + s] * b[col + s * N];
    }

    c[offset + col] = result;
}

In an attempt to understand the difference, I used the extension VK_KHR_shader_clock to measure the performance of separate threads (with the function clockRealtimeEXT())and print the result (with debugPrintfEXT()). I did the same in the Cuda kernel (clock64() and printf). I am not sure if this is correct at all, because the functions return time in clock cycles, and in Vulkan, there’s no way to translate it to seconds, but since I am running on the same hardware I am expecting this to be correct. So after doing this I see a similar to before difference, for example, on Quadro P1000 the numbers are: ~1845248 for Vulkan and ~568672 for Cuda which gives approximately the same x3 difference, as was measured with other timers.

nikitablack · January 6, 2023, 8:16am

As was pointed out by @Robert_Crovella, it was my mistake - I wrongly used rows and columns indexing:

// Vulkan
uint row = gl_GlobalInvocationID.x;
uint col = gl_GlobalInvocationID.y;

// Cuda
uint32_t const row{blockIdx.y * blockDim.y + threadIdx.y};
uint32_t const col{blockIdx.x * blockDim.x + threadIdx.x};

As a reminder - gl_GlobalInvocationID.x is equal to gl_WorkGroupID.x * gl_WorkGroupSize.x + gl_LocalInvocationID.x . So the indexing was reversed in Vulkan, which greatly impacted the memory access pattern.

system · January 20, 2023, 8:17am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Cuda vs Vulkan performance difference CUDA Programming and Performance vulkan	5	8647	January 5, 2023
Unusual slow NVIDIA Vulkan API on Linux, why? Vulkan	0	1070	November 18, 2018
Cuda vs Vulkan - performance issue (possibly __syncwarp related) Vulkan cuda , kernel , performance , vulkan	4	1247	October 8, 2024
Vulkan compute shaders vs. CUDA Vulkan cuda	9	10817	December 20, 2021
Performance comparison of CUDA and OpenCL CUDA Programming and Performance	2	1100	June 3, 2016
CUDA much slower than Shader? (Solved: remove compiler -G flag) CUDA Programming and Performance	2	749	October 22, 2014
Why CUDA slower that OpenCL? CUDA Programming and Performance	5	1540	September 12, 2018
Events vs Timers - big differences measurung kernel execution time CUDA Programming and Performance	0	3828	December 20, 2010
Comparison of a CUDA kernel performance running on different GPUs/Toolkits/Drivers CUDA Programming and Performance	2	948	July 7, 2014
Time differences CUDA Programming and Performance	1	698	January 11, 2012

Cuda vs Vulkan performance difference

Related topics