atomic ops problems

DaManu · January 20, 2009, 7:28pm

Hi,

i have some problems with atomic instructions, i get incorrect result, or timeouts or a rebooting machine…
after replacing my old kernel

global void houghKernel2(unsigned int* src,TColor* dst,int width,int height,int a,int n,float pia,unsigned int threshold,float n1)
{
const int x = blockDim.x * blockIdx.x + threadIdx.x;
const int y = blockDim.y * blockIdx.y + threadIdx.y;

...some logic...

1) dst[index]++;

}

with 1) => atomicInc((unsigned int*)&dst[index], 1);

It crashed the machine, or timeouted our does nearly NOTHING

I use XP, beta driver 180.x, VS2008 and Cuda 2.1

Machine 1: 8600GT - 50% reboot after executing, or timeout
Machine 2: 8800GTS - 10% timeout, 90% incorrect result ( array is nearly 0)

My old, non cuda mem-threadsafe, version executes in about 1sec with lightly varying but stable results.

Can somebody help me pleeeeease?

seibert · January 20, 2009, 8:19pm

If this kernel were to act correctly, how many atomic increments would occur (total, over all memory locations)?

DaManu · January 21, 2009, 11:13am

Max. 750M, avg ~100M

erdooom · January 21, 2009, 12:20pm

can you elaborate on your kernel ? from what you wrote i can’t make any sense of it, when do you use dest ? why do you need an atomic add ?

DaManu · January 21, 2009, 1:50pm

I’m doing a hough transformation from on an image, foreach edge pixel
iterate the buffer (dst) on the according sin/cos line. Because many edge pixel can
map to the same hough space (buffer) coordinate i need atomic inc, or memory gets
overriden by another thread.

erdooom · January 21, 2009, 2:03pm

so how did it work before without the atomic operations? can you post the kernel ? any ways one thing that can cause the problem is if you try and do global writes based on these atomic operations. Since global writes are not actually done when your code is executed, but cued up and flushed once in a while. hopefully soon Nvidia will expose a global memory flush command.

DaManu · January 21, 2009, 2:25pm

global void HoughKernel2(unsigned int* src,TColor* dst,int width,int height,int a,int n,float pia,unsigned int threshold,float n1)
{
const int x = blockDim.x * blockIdx.x + threadIdx.x;
const int y = blockDim.y * blockIdx.y + threadIdx.y;

#ifdef TESTRANGE
if( y>=0 && y<height && x>=0 && x<width)
#endif
{
unsigned int v0=src[widthy+x];
if(v0 > threshold)
{
for (int t = 0; t < a; t++)
{
float o = t * pia;
float r = x * __cosf(o) + y * __sinf(o);
int r2 = (int)(r + n1);
int index = t + ar2;
#ifdef TESTRANGE
if(index>=0 && index<an)
#endif
//old: dst[index]++;
atomicInc((unsigned int)&dst[index], 1);
}
}

}

erdooom · January 21, 2009, 2:37pm

I see no reason why this should cause problems, unless you are accessing dst out of bounds, have you tried running the code in emu mode ?

DaManu · January 21, 2009, 3:51pm

No, but with nonatomic increment it works

DaManu · January 21, 2009, 9:34pm

Somebody can give me an advice? Please? :-) I am really stuck on this

Tigga · January 21, 2009, 10:17pm

This worries me a tad. “Lightly varying”? What order errors are you getting? Sounds like it might be an array out of bounds error.

euk · January 22, 2009, 6:54am

In most cases such problems occur in case of faulty kernel (memory bounds, compilation problems, etc).
My experience tells me that an argument “but it works with non-atomic” is rather weak, because faulty kernel can often lead to an absolutely unimaginable behavior.

I would recommend the following:

Try to run on a different device (if the problem remains - than your device is good)
Try to exclude as much code as you can (except faulty part) from the kernel, to localize the problem
If in the result you will get a very simple kernel with about 100M atomic calls which reproducibly fails - you should better post this kernel here. I think in this case guys from Nvidia will show their interest in this.

erdooom · January 22, 2009, 7:07am

how do you allocate dst ? and why do you need to use casting ? have you tried just using an int * ?

erdooom · January 22, 2009, 7:09am

and i still recommend trying to use emu mode. some times it dose find the problem.

Topic		Replies	Views
Atomic Operations CUDA Programming and Performance	4	4614	November 11, 2015
Atomic instructions on global and shared memory CUDA Programming and Performance	9	2697	May 27, 2022
Really simple while loop issues CUDA Programming and Performance	4	3200	October 27, 2014
Optimizing CUDA kernel with atomics CUDA Programming and Performance	10	345	August 19, 2024
Atomic operation Getting atomicAdd support CUDA Programming and Performance	3	2921	December 3, 2007
unexpected behaviour from atomics on Volta CUDA Programming and Performance	12	1590	May 31, 2019
What I am doing wrong with atomicAdd() CUDA Programming and Performance	5	2328	November 1, 2010
Resolving CUDA struct alignment mismatch for Atom in atoms.h CUDA Programming and Performance kernel , visual-studio	5	34	May 17, 2025
slowdown modifying kernel return variable CUDA Programming and Performance	2	2584	December 28, 2009
how to debug random errors? CUDA Programming and Performance	6	3138	June 27, 2012

atomic ops problems

Related topics