Hello to everyone,
I hope somebody can answer my question and is regarding device2device memory transfers. I searched in the 3.0 guide and couldn’t find anything.
Is it possible to get global memory to go through the constant cache since it’s all global, but without transferring to the host and back?
If so, how can I do it?
I need the result of one kernel to be put on constant memory for the next kernel.
Thanks in advance,
cudaMemcpyToSymbol and cudaMemcpyFromSymbol take cudaMemcpyDeviceToDevice as an argument, I think. look at the fifth argument: cudaMemcpyKind
So copying from global to constant and from constant to global should be possible, but only in between kernel calls.
I think the real question here is how to invalidate the constant cache. I guess just getting the address of the symbol and (illegallly) poking the values there (e.g. with a special kernel) will work (at least pre-Fermi). Hopefully the constant cache will then be invalidated on the next kernel launch, but I have no knowledge of the internals (and not tried either).