Ofcourse there should be “}” closing functions. Program with boh function compiles ok, but first return proper matrix “Ad”, scund doesnt change “Ad” at all.
yes there is miswritten “)” in 'for" loop, but its obwious mistake i have dane when i was rewriting function to forum :), isnt it ? ( for some reason i couldnt use “copy/paste”) , in “real” program is
You have a major race condition there. All your threads attempt to write to Ad[i*mat_dim+j] at the same time. Neither of those kernels should work and even if they run, they will give garbage results (not to mention being slow).
… Ad[i*mat_dim+j] is just simple example but i don’t care of slow working, and unpredictible result it coould be A[0], so every thread would write the same pice( i want just to check why “similar” function doesn’t work in my program where each thread writes his own pice of memory, but … first, my program is too long to write it here , secund the problem can be chack on simplest program ). At first case ( 'sin" and “cos” ) outside loops, matrix is change (so function trig is executed ) when i put definision ‘s’ and ‘c’ inside loop … fiunction is not called ( 'Ad" doesnt change) but … function with ‘s’ and ‘c’ inside works if i change dimensions of block :)( with “s” “c” outside works ALWAYS , works ALWAYS either i define s=const1,c=sonst2, but if i use sin( ) and cos() it doesn’t ).
Could you use the NVCC option to check how many registers and how much shared memory your kernels use? One major difference between your two kernels is the number of registers in use.