I have been puzzled by the performance of my GPU code.
So I tried an experiment: in the existing code I insert
a conditional return as the first line, which is always true.
So the kernel now does no work. As expected the time between kernel<<<blocks,32>>>
and following gpuErrchk( cudaDeviceSynchronize() ); falls but only to about 30%
of its original time.
I think means about 30% of my elapse time is disappearing in just
starting and stopping my kernel. On a GTX 745 with 2000 blocks this
averages at about 26.5 microseconds
Is this what you would expect?
How can it be reduced?
LongY reported about half a microsecond (420 tics) but that was internal to
the GPU and so does not include cudaDeviceSynchronize() etc.
The kernel has 6 scalar arguments (five int, one float) and 7 pointer/array arguments
(int int char* short* and three int*).
As always and help or guidance would be most welcome
Thank you Bill Prof. W. B. Langdon Department of Computer Science University College London Gower Street, London WC1E 6BT, UK http://www.cs.ucl.ac.uk/staff/W.Langdon/
GI-2018 http://geneticimprovementofsoftware.com/ deadline 5th February 2018
2018 Humies http://www.human-competitive.org/call-for-entries
EuroGP 2018 http://www.evostar.org/2018/cfp_eurogp.php
choose your background http://web4.cs.ucl.ac.uk/staff/W.Langdon/colour_telephone/bgcolor.html
A Field Guide to Genetic Programming
GP EM http://www.springer.com/10710
GP Bibliography http://www.cs.bham.ac.uk/~wbl/biblio/