samples crashing - help? simpleGL and boxFilter, on device, crash system

Greetings,

I have just lost a bunch of work due to a crash, and so I need help.

Yes, yes, I back up! but not every 15 minutes. Can one set autosave to ask permission before saving?

the SDK samples simpleGL and boxFilter are capable of bringing my system to an unrecoverable halt,
requiring hardware boot. When I run them, it’s with the SDK makefile, no modifications to anything.
The crash always occurs after the openGL generated image appears, but beyond that it’s irregular.

It doesn’t happen every time. With simpleGL it’s about 20% and with boxFilter it’s more, but my sample is small.

Any suggestions about what to look into? I figure I better get this figured out before I get into trouble.

THANKS

clanz

Care to share hardware, OS, driver and SDK version? (at a guess I would say dodgy hardware, unless you have pre-existing OpenGL problems)

there are two tesla 1060s

SDK version 2.3 for linux ubuntu

-------------------------------DEVICE QUERY OUTPUT------------------

Device 0: “Tesla C1060”

CUDA Driver Version: 2.30

CUDA Runtime Version: 2.30

CUDA Capability Major revision number: 1

CUDA Capability Minor revision number: 3

Total amount of global memory: 4294705152 bytes

Number of multiprocessors: 30

Number of cores: 240

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 16384 bytes

Total number of registers available per block: 16384

Warp size: 32

Maximum number of threads per block: 512

Maximum sizes of each dimension of a block: 512 x 512 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 1

Maximum memory pitch: 262144 bytes

Texture alignment: 256 bytes

Clock rate: 1.30 GHz

Concurrent copy and execution: Yes

Run time limit on kernels: No

Integrated: No

Support host page-locked memory mapping: Yes

Compute mode: Default (multiple host threads can use this device simultaneously)

But what card are you doing the openGL rendering on? Neither C1060 can drive a display

The card itself says

e GeForce 7200 GS Rev : 1.0

my last post seems to have disappeared into the ether, so here’s more info

that I tried to send before

with the whole setup we got, among other things, an installation cd: " evga version 08-956-01-1 "

also a page came with the installation documentation (which we never had to use…) that says we have

" Graphics Intel Graphics Media Accelerator 3100 (GMA 3100)

display driving cabled card = " e GeForce 7200 GS Rev : 1.0

Ubuntu 9.04 kernel 2.6.28.15 generic

motherboard P6T6 revolution rev. 397.02 (intel Xeon 3500)

It is probable that SDK OpenGL examples were not intended to work the way you are trying to use them, so I wouldn’t read too much into them crashing. I don’t think the authors were expecting that the rendering and CUDA computation phases would be happening on different GPUs.

OK, that’s good news. But how can you tell that both GPU’s are or aren’t being used, and is there a way to force

the use of only one?

Is it just that the nvcc compiler “knows” I have two GPUs and automatically uses them both?

THANKS AGAIN for all your help

By default, a CUDA kernel only ever runs on a single CUDA capable GPU. If your code doesn’t explicitly implement some sort of multi-GPU framework of its own, it can only ever run on a single GPU.

Further to that, if you do nothing in the way of GPU selection in your code, the driver just chooses the first enumerated device to use. NVIDIA provide a utility call nvidia-smi which lets you further control this behaviour, by assigning a given device one of three possible status - “normal”, “compute-exlcusive” (allow exactly 1 running CUDA context on this device), and “compute-prohibited” (don’t allow CUDA kernels on this device at all).

The compiler has nothing to do with any of this, it is all controlled by the driver at runtime.