I have been puzzled by the performance of my GPU code.
So I tried an experiment: in the existing code I insert
a conditional return as the first line, which is always true.
So the kernel now does no work. As expected the time between kernel<<<blocks,32>>>
and following gpuErrchk( cudaDeviceSynchronize() ); falls but only to about 30%
of its original time.
I think means about 30% of my elapse time is disappearing in just
starting and stopping my kernel. On a GTX 745 with 2000 blocks this
averages at about 26.5 microseconds
Is this what you would expect?
How can it be reduced?
LongY reported about half a microsecond (420 tics) but that was internal to
the GPU and so does not include cudaDeviceSynchronize() etc.
The kernel has 6 scalar arguments (five int, one float) and 7 pointer/array arguments
(int[31] int[8][5][5] char* short* and three int*).
As always and help or guidance would be most welcome
Thank you
Bill
Prof. W. B. Langdon
Department of Computer Science
University College London
Gower Street, London WC1E 6BT, UK
http://www.cs.ucl.ac.uk/staff/W.Langdon/
GI-2018 http://geneticimprovementofsoftware.com/ deadline 5th February 2018
2018 Humies Call For Entries | Human Competitive
EuroGP 2018 Evostar 2018 Parma
barracuda_0.7.107h http://seqbarracuda.sourceforge.net/
choose your background Do not look at the screen. See what colour it makes your face.
A Field Guide to Genetic Programming
http://www.gp-field-guide.org.uk/
GP EM Genetic Programming and Evolvable Machines | Home
GP Bibliography The Genetic Programming Bibliography