Use a texture sampler on a buffer object

In CUDA, one can use a texture unit to read from any location of global memory using ‘cudaBindTexture’ and ‘tex_Dfetch’. Is it possible to do the same in OpenCL?

I’ve been reading the specification and it seems like OpenCL make a distinction between buffer objects and image objects, and that a sampler can only read from image objects. Is there a way to either make an image object use the same memory as an already existing buffer object, or have a kernel use a sampler to read from a buffer object? One solution would be to copy the buffer contents to the image using ‘clEnqueueCopyBufferToImage’, launch the kernel and then copy the data back again, but I would like to avoid this seemingly pointless copying.

The reason I want to read through a sampler is that I have a kernel implemented in both CUDA and OpenCL where the CUDA version, which uses textures, gets much better performance than the OpenCL version, which uses normal array lookups from global memory. Changing the CUDA version to use normal array lookups reduces the performance to be the same as the OpenCL version. The reason for the difference in performance is, I assume, because I cannot properly coalesce the memory reads, but they are at least somewhat close to each other taking advantage of the texture cache.

I don’t have an answer for you but I am also quite interested in this topic.

Restricting texture sampling to Image objects creates an asymmetric data management problem that pollutes our compute framework. I would love a zero-copy method to interoperate a buffer in the appropriate format with the texture sampling hardware.

My reading of the spec is essentially the same – you cannot alias images and buffers using any legal means currently specified. In fact, there is language in several places (especially sections 5.2, 6.8 and 6.11.8) that requires checking for attempts to do so (for example, passing an image as a kernel argument where a buffer is required, or vice versa).

The motivation is most likely to promote code portability, since the image/texture formats are likely to be device specific. To the extent that one can avoid using such aliasing it should be avoided. Yet I also find it frustrating that I cannot get that kind of performance out of OpenCL.

The one case where I needed to avoid the copy was where I simply made all accesses into image accesses. This does not work if you write back to the object you fetch from. I also had the problem of emulating a large single-demension image as a 2D image. It worked, but it all felt quite clumsy compared to CUDA.