Buffer to Image memory transfer performance issue Does clEnqueueCopyBufferToImage imply a pixel conv

As a workaround to been unable to directly write to 3d texture, I use a clEnqueueCopyBufferToImage mem transfer in my application. But the performance are pretty slow : 7055 MBps while the Cuda Bandwidth Test achieve 73532 MBps (device to device).

Is there any implicit conversion involving the gpu processor ? Is it that texture memory is simply slower ?!? Does it as anything to do with the fact that I’m working with 3d texture?


PS: My texture has been declared as a single channel float Image (CL_R with CL_FLOAT)