I have some data like
double data[64][64][64];
Since each thread in a block will access[0][y] and then [1][y] and so on, I decided to use texture to take advantage of texture cache.
so I wrote something like :
[codebox]cudaArray *da_COSMAG;
texture<int2,2> t_COSMAG;
…
NM=64;NVEC=64;
…
->kernel function that fills d_COSMAG
…
cudaMallocArray(&da_COSMAG, &t_COSMAG.channelDesc, NM, NM);
cudaBindTextureToArray(t_COSMAG, da_COSMAG);
cudaMemcpy2DToArray(da_COSMAG, 0, 0, d_COSMAG, NMNMNVECsizeof(int2),NMNVECsizeof(int2),NMsizeof(int2)
, cudaMemcpyDeviceToDevice);
status = cudaGetLastError();
[/codebox]
and this returns :
invalid pitch argument
I saw on other post that the pitch is limited to 2^16 * 4 bytes, which means I can only create texture of size 262144 bytes. Is this right?
if I want to fragment my texture in many subtexture, would it solve the problem? if yes, how can I fragment this in a dynamic way? in general, I would like to be able to change the size of this table depending on input parameters.
thanks