Unusual slow NVIDIA Vulkan API on Linux, why?

Hello, I am comparing CUDA and Vulkan compute performance of my code (simulation of cardiac tissue). I measured wall time and CPU time. Results show that:

  1. Vulkan is twice slower than CUDA kernel
  2. When using empty kernels, CUDA is much faster than Vulkan (0.05 vs 0.32 s)

So probably that is caused by driver API. Any ideas, comments?

Linux kernel 4.19.2, NVIDIA driver 410.73, CUDA 10
Here is full code: https://drive.google.com/open?id=1Q72ERuCLypzvRjAzSZ3a3bLD9k2JEGuW
Code is using helper library, we are discussing this issue: https://github.com/Glavnokoman/vuh/issues/23

Normal kernels:
CUDA:

  • 0.130667 s wall time
  • 0.130607 s CPU time

GLSL Vulkan:

  • 0.3833 s wall time
  • 0.382567 s CPU time

Empty kernels:
CUDA:

  • 0.048032 s wall time
  • 0.048047 s CPU time

GLSL Vulkan:

  • 0.317341 s wall time
  • 0.316852 s CPU time