Hello,
I’m just starting out with with CUDA but have hit a couple of a fundamental issues which hopefully have obvious solutions.
The following relates to a Win7 64 i7-2600 PC with GTX590.
The CUDA program is simple. I send some data to the GPUs, the kernels (one on each GPU) then run in a tight loop and spit out the answers from each thread using printf.
This works perfectly as long as the kernel only takes a few seconds to execute, but if I add another loop level so that the kernel should take about 90 seconds to run, the PC seems to lock up but eventually recovers and I get the correct answers after about 15 minutes or so.
Another level still, and instead of taking about 45 mins, I lose the PC for a couple of hours, but this time I lose some data as the printf buffer is only a certain size. I don’t want to increase the buffer size until I understand what’s causing the lockups though.
Now, I’m assuming that this stems from the CUDA sharing the same hardware as the display driver, but I haven’t seen much documentation on the subject.
I have added the Registry entry that disables the kernel execution time limit and stops Windows restarting the display driver, and I don’t care if the screen goes blank (a la ZX80) when the CUDA is doing its stuff, but I see no NVIDIA control to allocate resources between Display/CUDA.
If the advice is to separate the CUDA and display functions, I have enough slots on the motherboard to allow the use of a spare GTX580 as the display driver and leave the cUDA to the GTX590. The question then would be whether the 750W PSU could power both simultaneously even if I have nothing more than the desktop running on the GTX580?
…but I do have a spare PSU, so I guess I could hotwire it to be always on then use it to supply the +12V rail to one of the cards. Would there be any risks in a card receiving power from different PSUs? I’m guessing not, but there there may be issues regarding the sequence the various voltage rails come up.
The easy answer to this is to have the host handle the outer loops so that only short CUDA kernel runs are needed, but where’s the fun in that. :)
Any advice or abuse welcome…