Seg Faults with GTX 295 in compute exclusive mode

Hi Everyone,

I currently have a GTX 295 card in my debian Squeeze x86_64 system that shows up as 2 GPUs using CUDA 2.2 with the 185.18.36 driver. I set each GPU to compute-exclusive mode, so that a free GPU would be assigned to each independent kernel I launch. I tried testing this configuration by running two independent instances of the same program I wrote, which works using 1 GPU in default mode. When I run the first instance of my program, and then run a second instance in another terminal shortly after, the first instance of my program segmentation faults and I get the following in dmesg:

[14091.451808] test_cuda[17877]: segfault at 8 ip 00007fe14792a3a7 sp 00007fffb83dab80 error 4 in libcudart.so.2.2[7fe14790d000+3e000]
[14094.485504] NVRM: Xid (0005:00): 13, 0001 00000000 000050c0 00000368 00000000 00000080
[14097.028211] NVRM: Xid (0006:00): 13, 0001 00000000 000050c0 00000368 00000000 00000080

Would anyone know why I am getting this error? Please let me know if you have any ideas!

Jonathan

That looks like a driver problem to me. Those Xid 13 messages suggest that the driver has been forced to do a hard reset on both devices. You might want post a complete repro case and file a proper bug report. You might also consider upgrading to the 190 series drivers and CUDA 2.3 and see whether the problem goes away.

This is all good advice. Also, are you checking all of your error codes? Can you post a repro?

Hey guys, thanks for the quick response!

I was able to get it working using the 190.18 beta drivers that are recommended for use with CUDA 2.3. However, when I try to use the newer 190.32 and 190.36 drivers, I continue to get seg faults in my code.

Is this something I should submit a bug report for? If so, what is the process to do so?

Jonathan

You post a repro case so I can try it. :) (alternately, you PM me source so I can try it)