primary graphics card G80 questions


I have heard that cuda won’t run for more than 5 seconds on XP if the graphics card is the primary one. However I have been experiencing problems with it being the primary in applications running shorter (runs fine with emulation on but not otherwise). Could there be some register conflicts or similar when XP and Cuda use the card?
Is there any solution to this apart from getting two graphics cards?

Tnaks a bunch in advance guys.


If everything works fine in emudebug and emurelease, but doesn’t in debug and release, then maybe you’re using too many registers/shared mem.

at most 16 for 512 threads, 32 for 256, etc (8192/threads but you’re better off using at least 16 threads/block for best scheduling)

you can check the shared mem and register usage using the -cubing option of nvcc

Interesting that you point that out, thanks. I am using 256 blocks and 256 threads each block. According to the guide this should usually work however perhaps not in my case. What exactly goes into the registers? Local variables?

Thanks for any replys.


I believe in general, that CUDA program that run for less than 5 seconds/kernel call are okay, and you are suffering some programming/code/compiler/runtime problem, not a conflict with Windows (Health warning: I use Linux).

It is worth compiling with debug as some of the CUDA_SAFE_CALL() checks which may help you are removed when compiling for release.

Yes, local variables. As said, you could provide the -cubin option to find the number of registers used.

I just compile with -keep all the time, and look in the .cubin file when I need to, it looks a bit like a bunch of initialised C structs.

In a section called code, you’ll see the name of your cuda device-function, and the number of reg-isters.

If you’ve looked in the .ptx file, you’ll see the pseudo assembler which appers to use lots of registers, but this doesn’t show the actual use of registers which you’ll find from cubin (I believe there is a further ‘compilation’ stage in the device driver, so I don’t know that cubin is definitive, but it’s as close as I know).

Try reducing your thread dimensions to 4x4 or something (and maybe a small data set) to see if it starts working.