I am using a new machine with GeForce 9800GX2. The problem that has been causing headache to me is that the initialization to this device is always very slow. I mean, the first function that has anything to do with the card costs a long time, no matter what it is (cudaSetDevice, cudaGetDeviceCount, cudaMalloc…). Anybody can help me with this?
That’s probably the overhead of context creation, although that would be true for cudaMalloc and not cudaSetDevice, cudaGetDevice, or cudaGetDeviceCount. What are you seeing as “very slow,” though? It shouldn’t add more than a few hundred milliseconds unless your PCIe bandwidth is terrible.
It’s more than 2 seconds. Actually, it’s a cluster, in which 8 nodes are equipped with this type of card. I just don’t know why the invocation of a CUDA function will be so slow…
Going to need more information on your setup (OS and hardware), how you’re running your app, and what app (meaning source) you’re using to test this, then. There shouldn’t be any problem with CUDA initialization taking this much time in 2.0.
For example, the sample “scan” in the installation package. The total execution time is 7 seconds. I can observe that it waits quite a while printing anything( the printings are after the first CUDA function call).
Slow start-up with multiple GPUs and no X running is a known problem. We are working on improving the performances.
OK, I see. Thank you very much.
Apologies for digging up this old thread. Wish to know, has this issue been fixed, or what are the ways to fix it?
I’m using a non-nvidia PCI card for display (X), and CUDA 2.3 is installed for the PCI-E nvidia card. CUT_DEVICE_INIT is taking about 1.X seconds.
Previously, I was using the PCI-E nvidia card for X and CUT_DEVICE_INIT is almost immediate.