In my CUDA-codes, I make extensive use of 1D textures for caching, or to be more specific I bind/unbind textures to device pointers quite often. For example,
A kernel computes data from some input arrays, collectively called d_in, to some output arrays, d1out. The data access in this kernel is coalesced:
then I bind d1out arrays to textures
these texture are used for cached-memory access by another kernel
kernel_2(d2out) (reads from d1out are local but not coalesced-> texture to reduce access latency; d2out writes are coalesced)
then I unbind d1out from textures, and bind d2out
which I use in final kernel to write the final result to d1out
kernel_3(d1out) (reads from d2out are local but not coalesced-> texture to reduce access latency; d2out writes are coalesced)
and the loop repeats
I have many big 1D arrays, which are processed in this way, and therefore ideally I’d like not to allocate any extra read-only memory, and rather reuse device memory by binding it to read-only cache memory (textures in CUDA).
Another advantage from using textures for me, is automatic handling of boundaries, so that I never get segfaults when I read index -few, or size+few, etc.
Is this trick possible in OpenCL?
On other topic: I use also a lot of function templates in CUDA. Is there a way to do this in OpenCL?
Thanks in advance!