Dangers of programming your GPU

Melanikus · April 14, 2009, 6:44am

Any clues? =p

I was programming some image training application which allocated 200 720x480 grayscale images (float) so, 260mb on my 9800GTX,and some aditional buffers that summed up about 6mb, and got a cool rain of randon pixels on my screen plus a message saying the device driver was not responding ^_^

One question, if I allocate memory on host does this takes up adressing space on the videoboard? makes sense I think, if I have 300mb allocated on the video and more 300 allocated on the host (cudaMallocHost) and the videoboard has to “SEE” this memory, than it should take it’s adressing space on the GPU, is that the case?

(I run riva tunner there but I just use it to keep track of the GPU’s temperature, no overclocking as you can see in the gadget, and this was a bad screenshot it was way more noisy than that, just got it at the wrong time)

MisterAnderson42 · April 14, 2009, 12:34pm

This can happen if your kernel writes past the end of the memory that you have allocated for it.

cudaMalloc allocates memory on the device. (make sure you check for error return values…)

malloc/new/cudaMallocHost/cudaAlloc only allocate memory on the host. All copies from the host->device are done with DMA which doesn’t waste any memory on the device.

YDD · April 14, 2009, 1:14pm

No memory protection on a GPU? Perhaps this should go into the feature request thread (although it’s more of a hardware issue).

gatoatigrado · April 14, 2009, 6:36pm

yeah, I used to get this all the time. Just add some bounds checking and infinite loop breaking code, which can be turned off by the preprocessor.

“danger” is a bad characterization, it’s just corrupting GPU memory. Also, I recommend calling sync on Linux machines (can’t find a windows equivalent with a quick google search) before invoking the kernel to make sure the source file has been saved.

regards,
Nicholas

tmurray · April 14, 2009, 6:38pm

There is memory protection, it’s just that the recovery functions weren’t quite robust enough before. I think this has been fixed now (with the very latest driver, I can sigkill a Linux app running an infinite looping CUDA kernel and the machine keeps working absolutely fine–how do you like that). I think the XP/Vista protection enhancements come with 2.2 and the Linux ones come in the driver immediately after.

YDD · April 14, 2009, 7:16pm

I don’t quite see what that’s got to do with memory protection. I’m thinking of: if I have a kernel which starts zeroing out all of the memory on the card running my display, will it be caught, or will I watch X11 lock up. I’ve had at least one case where I had X lock up on me, and it turned out (when I ran the code through valgrind in emulation) that I was scribbling off the end of an array. I realise I should have kept a test case, but coming up with one is tedious when the machine requires rebooting everytime you have a ‘success’ …

tmurray · April 14, 2009, 8:07pm

There is memory protection. All of the addresses are virtual; there’s no mapping from random addresses to other places in memory with sensitive data. The problem was that before, there were times when a context could do something bad (segfault, write to bad addresses, etc), the driver would kill it, but the driver wouldn’t entirely recover. Maybe X would crash, driver would crash later on, etc., it was bad. This has been fixed.

gatoatigrado · April 14, 2009, 8:21pm

awesome :) when’s this post-2.2 linux driver coming?

Quoc_Vinh · April 15, 2009, 2:19am

some time I got the same your problem.
In that case I used a lot of device memory, so I think this problem occurs when using a lot of device memory.
Tested machine is using XP sp2. and one only GPU for both purpose graphic card and CUDA computing.

YDD · April 15, 2009, 1:01pm

Understood - and good to hear it has been fixed :)

djgandy · April 16, 2009, 8:19pm

Yep, that happened to me. Make sure you free any allocated memory :)

Topic		Replies	Views
Memory used by Cuda Is is protected? CUDA Programming and Performance	6	9672	February 28, 2010
Serious security issue with CUDA on Linux CUDA Programming and Performance	5	22519	February 1, 2011
Bad driver pointers CUDA Programming and Performance	3	1153	July 22, 2009
Help!! cuMemHostAlloc() keeps rebooting my machine !! CUDA Programming and Performance	9	2253	February 20, 2013
Host Memory mapping to GPU CUDA Programming and Performance	3	6094	February 3, 2012
Using cudaHostAlloc CUDA Programming and Performance	0	6526	May 9, 2011
Cuda security CUDA Programming and Performance	4	3757	September 22, 2008
Pinned memory limit CUDA Programming and Performance	16	13958	May 1, 2016
Accessing GPU global memory allocated on device - by host CUDA Programming and Performance	3	1273	June 3, 2013
CUDA causes system freeze system has to be reset to work again ... CUDA Programming and Performance	4	11174	November 30, 2009

Dangers of programming your GPU

Related topics