Performance loss porting code from Ubuntu to Windows 10


I ported my CUDA code from Ubuntu 20.04 (Code::Blocks) to Windows 10 (Visual Studio 2019) - same hardware (dual boot/OS machine). CUDA toolkit latest version on both OS. After nvcc compile, Windows version binary is approximately 15x slower than Ubuntu version.

Without posting the entire code now, is there a “classic” mistake/trap while porting code to another IDE/OS?


building a debug project instead of a release project

1 Like

Oh man … why didn’t I ask before?
Would have saved me lots of time.
Many thanks!

I’m now from 15x to 1.3x slower. Not where I was before though, but quite acceptable.
Neverthess wondering why I can’t reach Ubuntu speed … ?!

My guess would be now the difference is attributable not to kernel execution but to some other aspect. My expectation is that a given kernel launch should take the same amount of time to execute, whether the OS is windows or linux, all other things (GPU, machine config, kernel code, input parameters, grid config, compile settings, etc.) being the same.

So my guess is the difference is in something that should be pretty evident from a profiler timeline view. For example, if on WDDM, the actual windows OS usage of the GPU could be getting in the way of the CUDA usage of the GPU.

1 Like