I think the need to initialise large arrays comes up many times.
Is there any hardware support for this?
(I seem to remember on the DEC VAX computer you could ask for a zero page of memory,
which gave you n*512 bytes all preset to zero.)
Perhaps on a GPU it would be nice to have values other than zero.
This thought was prompted by noting on
(half of a) GeForce GTX 295 cudaMemset clears about 14.5 billion bytes/second,
whereas the SDK bandwidthTest’s cudaMemcpy (cudaMemcpyDeviceToDevice) claims 93 billion bytes/sec.
Why the difference?
ps: was incorrectly posted to apple forum:-(