I would like to ask whether any of experienced the same result as I’m, I developed an OpenACC code to run both on Ubuntu and Windows, with the same CUDA version, but probably different graphics driver (I don’t know whether this will affect the execution time or not). The result that i got on Ubuntu was 0.99s compared to the one that I got from Windows 10, which was 4.8s, any tips on reducing the time? I did some profiling, and it seems like most of time was spent kernel lunches, fyi: I’m not really knowledgeable in CUDA, thus I’m using OpenACC to accelerate my program. Any tips would be good (except removing Windows on the university’s server workstation xD).
Is this on a GPU that is also running a Windows display?
Yes!, since Intel Xeon E5-1620 v3 doesn’t have any integrated graphic…:/
Is there a considerable gap between the time after the kernel launch call is finished and wen the kernel launch actually begins? If so it is most likely an issue with Windows batching calls to the GPU as is customary with the device driver model on Windows.
The most straightforward way around the problem is use nvidia-smi to force the graphics adapter to use the TCC driver model with -dm. But since you’re using it as the primary adapter, I don’t think that will work for you. My second suggestion would be to switch to integrated, but that doesn’t seem to be an option with your CPU. An alternative card to drive the display would work, if you have a spare.