I have a mask of 496 bits (for extra programming fun it is a triangular matrix)
Typically 30% of the mask is set.
Only for each set bit do I need to do some work.
The current approach is to use a block size of 512 and take the performance hit
of about 363 threads being idle because, either their thread id is above 495
or their bit in the mask is zero.
The next thought was to reduce the block size to 32
and separate the thread id from the work to be done (indexed by p,q).
So each of the 32 threads will be given a p,q index corresponding to
a bit in the mask being set. Once 32 thread warp finishes, we go back to the
mask and find the next collection of 32 set bits,
and so on until we have processed the whole of the mask.
This would take on average about five passes of one warp
(rather than one pass of 16 warps at present).
Given the work to be done is mostly going to be reading global memory
with at present little integer arithmetic, I wanted to know
if other people thought this was worth trying. Even using njuffa’s code,
there will be a fair amount of bit fiddling to convert the bit mask into
a list of (pq) indexes of work to be done.
I am planning to access the 14 arrays in the same order as now.
(The clear bits in the mask typically correspond to elements in these
arrays which are skipped over.)
(Something similar was posted here on 18 Jul 2011, but my problem is smaller
and denser and njuffa had not posted his find_nth_set_bit 12/29/2014 08:48 PM
Oh also, the sides of the mask are 31, so every column of the mask will fit
into 64 bits, so would it be worth converting find_nth_clear_bit from 32
unsigned int to uint64_t ? Or should I shift each column of the mask to start
at the least significant bit of an unsigned int?
As always any help or guidance would be most welcome
Thank you Bill Prof. W. B. Langdon Department of Computer Science University College London Gower Street, London WC1E 6BT, UK http://www.cs.ucl.ac.uk/staff/W.Langdon/
GI-2018 http://geneticimprovementofsoftware.com/ deadline 5th February 2018
2018 Humies http://www.human-competitive.org/call-for-entries
EuroGP 2018 http://www.evostar.org/2018/cfp_eurogp.php
choose your background http://web4.cs.ucl.ac.uk/staff/W.Langdon/colour_telephone/bgcolor.html
A Field Guide to Genetic Programming
GP EM http://www.springer.com/10710
GP Bibliography http://www.cs.bham.ac.uk/~wbl/biblio/