On device 1 of a 295 GTX I am having problems with adding together answers produced by separate threads.
(If all threads return their answer to the host and I add them together in the PC all is well)
So in desparation I am re-reading the documentation.
Harris’ reduce6 code makes great play of minimising the number of times __syncthreads() is used.
But it seems to me there may be a hole, if it were to be as part of a bigger kernel,
Then threads 32 etc could overwrite shared array sdata before thread 0 was finished.
To prevent this should I put another __syncthreads() after the code which write’s to g_odata ?
As a separate issue the manual (B.2.4) talks about volitle but reduction_kernel.cu does not use it.
Am I right in assuming it is not needed?
ps: CUDA 2.3 Driver 190.42-r2 Linux st127 2.6.29-gentoo-r5
Dr. W. B. Langdon, Department of Computer Science, King's College London, Strand, London, WC2R 2LS, UK [url="http://www.dcs.kcl.ac.uk/staff/W.Langdon/"]http://www.dcs.kcl.ac.uk/staff/W.Langdon/[/url]
FOGA 2011 http://www.sigevo.org/foga-2011/
CIGPU 2010 http://www.cs.ucl.ac.uk/external/W.Langdon/cigpu
A Field Guide to Genetic Programming
GP EM http://www.springer.com/10710
GP Bibliography http://www.cs.bham.ac.uk/~wbl/biblio/