I’m still unfamiliar with the texture memory and with my first texture memory usage some questions came to my mind:
It seems that cudaMemcpyToArray() is much much slower than a simple cudaMemcpy() without texture memory. If this is correct, can someone try to explain this behaviour?
I have 2 kernel functions which interact the following way
inputData -> kernel 1 -> intermediate result (resides on the device) -> kernel 2 -> final result (copied back)
Until now, inputData was stored in the global memory. Now it’s stored in the texture memory and kernel 1 speeded up a little bit. But surprisingly kernel 2 is now slower than before. The intermediate result is an extra array in the global memory, so I am not reusing the inputData array. Does someone have an idea why this change has an impact on kernel 2?
What is the difference between a CUDA array (1D) and linear memory bound to a texture reference?