Hi,

I am trying to write a prog for a C1060 (Fedora 10) and I am having problems with (what else?) __threadfence.

The problem is: I have to iterate a huge number of globally coupled maps, that is maps that need a value that depends on all other maps.

Something like:

Mn(t+1)= some function + A(t)

where Mn is the nth map and A(t) is the sum of all Mn(t).

Problem is that a single thread can iterate about 100 Maps (memory problems…) so I need several blocks of nt threads to accomodate all the maps.

I tried with this kernel, sort of, lots of stuff is missing but the idea is there.

…

“res” is for the results and “avg” is where I store the averages of each thread, both are defined in the main().

**device** double ta;

**global** void calc(double *res,double *avg)

{

double x[100],tavg;

int nt = blockDim.x; // How many Threads in Block

int nb = gridDim.x; // How many Blocks in Grid

int i = threadIdx.x; // My thread

int b = blockIdx.x; // My block

int m; // My index

int n,t;

int tl;

// my index

m=b*nt+i;

// initialize x for the thread and calculate local average

tavg=0.;

for(n=0;n<100;n++)

{ x[n]=(something randomized);

tavg+=x[n];

}

tavg=tavg/100;

///

// store in avg[m], sync, and build ta, the total average

avg[m]=tavg;

__threadfence(); // wait until all threads sync and avg is filled

if (m==0) //one of the threads does the total average

{ tl=0.;

for(n=0;n<nb*nt;n++)
tl+=avg[n];
ta=tl/(nb*nt);

res[0]=ta;

}

__threadfence(); // wait until all threads sync and ta is filled

start the loop where sort of the same is done

iterate the maps using “ta” as the total average.

each thread “m” calculate the local average and stores it in avg[m].

__threadfence() , total average, __threadfence()

save it in res[t] and loop.

Right now I am not interested in speed just in getting a result (I am not…). If __threadfence() is supposed to make sure that all the writes to global memory i.e. avg are done why doesn’t this work? Or maybe it does work and the problem is elsewhere and I am too stupid to see it?

I cannot see what is happening with the debugger as it fails on this prog (a different problem).

Is there a different approach for this?

Thanks for any help.

GiP