problem witch trigonometry ? ;)

Hi. i have strange problem. function:

__global__ void trig(float *Ad,float a, int mat_dim)

{

  float s=sin(a),c=cos(a);

  for(int i=0;i<(mat_dim-1);i++)

	for(int j=i+1;j<mat_dim);j++)

	  Ad[i*mat_dim+j]=s+c+threadIdx.x;

WORKS fine … but

__global__ void trig(float *Ad,float a, int mat_dim)

{

//  float s=sin(a),c=cos(a);

  for(int i=0;i<(mat_dim-1);i++)

	for(int j=i+1;j<mat_dim);j++)

	{

	  float s=sin(a),c=cos(a);

	  Ad[i*mat_dim+j]=s+c+threadIdx.x;

	};

DOESN’T !. Does anybody know why ?

Ofcourse there should be “}” closing functions. Program with boh function compiles ok, but first return proper matrix “Ad”, scund doesnt change “Ad” at all.

I really don’t believe either of those code blocks will compile, irrespective of the missing end of function braces.

yes there is miswritten “)” in 'for" loop, but its obwious mistake i have dane when i was rewriting function to forum :), isnt it ? ( for some reason i couldnt use “copy/paste”) , in “real” program is

[code]

for(int j=i+1;j<mat_dim;j++)

[\doce]

and compile well.

You have a major race condition there. All your threads attempt to write to Ad[i*mat_dim+j] at the same time. Neither of those kernels should work and even if they run, they will give garbage results (not to mention being slow).

… Ad[i*mat_dim+j] is just simple example but i don’t care of slow working, and unpredictible result it coould be A[0], so every thread would write the same pice( i want just to check why “similar” function doesn’t work in my program where each thread writes his own pice of memory, but … first, my program is too long to write it here , secund the problem can be chack on simplest program ). At first case ( 'sin" and “cos” ) outside loops, matrix is change (so function trig is executed ) when i put definision ‘s’ and ‘c’ inside loop … fiunction is not called ( 'Ad" doesnt change) but … function with ‘s’ and ‘c’ inside works if i change dimensions of block :)( with “s” “c” outside works ALWAYS , works ALWAYS either i define s=const1,c=sonst2, but if i use sin( ) and cos() it doesn’t ).

Could you use the NVCC option to check how many registers and how much shared memory your kernels use? One major difference between your two kernels is the number of registers in use.

I have faced a similar problem ([url=“http://forums.nvidia.com/index.php?showtopic=96829”]http://forums.nvidia.com/index.php?showtopic=96829[/url]). Unpredictable results would appear when there were more than one active block per multiprocessor. I believe that the problem is due to the way GPUs handle multiple threads writing to the same memory location.