I am attempting to learn cuda by writing an app that does brute force collision detection on two polygon meshes. I have not implemented any sort of optimization, instead I just copy all the triangles of each mesh into shared memory then let each thread check for collision on two triangles then return the results into a 2d array that is sizeMesh1 x sizeMesh2. anyway It seems to work good when the meshes are relatively small with 320 triangles each. but if I try larger meshes with 4900 triangles each I start getting some very strange artifacts on my screen(random pixels start flickering), and my app crashes. the strange behavior stays on the screen until I reboot. Im guessing that I am using to much shared memory and perhaps it is interfering with my display memory. I am currently installing cuda 2.1 and I was planning on looking into the visual profiler to see if that sheds light on my problem.
is this a correct diagnosis of the problem?
is there any way to fix the problem without rebooting?
am I damaging my graphics card by doing this?
by the way I am running on a dell xps 630 with a gtx280 with two monitors
Sounds like a classic case of out-of-bounds memory access, quite simply you’re writing to a part of memory you shouldn’t be (probably way past the bounds of an array you made), which is causing graphical corruption as a side affect. In linux you can simply restart the display driver, I don’t think you can easily do this in windows (without uninstalling/reinstalling the driver, which windows will want you to reboot for anyway).
In theory you shouldn’t be damaging the graphics card, unless the driver lets you (the only way I can imagine this happening is if you somehow write into a part of memory that affects how the driver works - making it malfunction and potentially do something dangerous to the hardware - however if nVidia know what they’re doing, and they seem to, you wouldn’t be able to write into protected memory regions like that).
Just a heads up, BSOD’s relating to nvdisp*.dll or some such are also a common result of out of bounds memory access.
This could be caused possible two things, each I’ve experienced both first hand:
[indent]I did this by overclocking the memory too much on the card causing something resembling the matrix intro - if your overclocking, perhaps consider not doing it. Check the temperature using EVGA Precision (any manufacturer will work) - Aim for under 80ish degrees C[/indent]
Writing to global memory that isn’t yours
[indent]Check that your writing within the areas you’ve allocated for your kernel in your launch code. I accidentally wrote into part where windows kept the screen - was very interesting! Your kernel shouldn’t launch if you try allocating too much shared memory, plus its usage is exclusive to the current kernel [/indent]
Just to add onto what Smokey and yummig wrote…if you’re using linux, there is a tool called valgrind that can check your program for memory access errors. Many of the advanced users on this forum have had good success using valgrind to flush obscure memory access bugs from their kernels.