In my case I will perform reads from different parts of the memory but with some locality. I will do some gathers with some locality in my algorithm so I think texture cache is good.
Slightly off topic but I need some direction on how to create a 2D texture with float4s
I have this code: Will my cudaArray’s elements contain float4s?
#define TEX_WIDTH 256
#define TEX_HEIGHT 256
float h_bottom[TEX_HEIGHT][TEX_WIDTH*4];
float h_top[TEX_HEIGHT][TEX_WIDTH*4];
float h_front[TEX_HEIGHT][TEX_WIDTH*4];
float h_back[TEX_HEIGHT][TEX_WIDTH*4];
float h_left[TEX_HEIGHT][TEX_WIDTH*4];
float h_right[TEX_HEIGHT][TEX_WIDTH*4];
//fill arrays with data
cudaStream stream[7];
for(int i = 0; i < 7; ++i)
cudaStreamCreate(&stream[i]);
//perform a async memcpy from host data to cudaArray
cudaArray *d_bo, *d_t, *d_f, *d_ba, *d_l, *d_r;
cudaMalloc3DArray(&d_bo, channelDesc, extent);
cudaMalloc3DArray(&d_t, channelDesc, extent);
cudaMalloc3DArray(&d_f, channelDesc, extent);
cudaMalloc3DArray(&d_ba, channelDesc, extent);
cudaMalloc3DArray(&d_l, channelDesc, extent);
cudaMalloc3DArray(&d_r, channelDesc, extent);
//channelDesc is {32, 32, 32, 32, cudaChannelFormatKindFloat} //is it ok to
cudaMemcpy2DToArrayAsync(d_bo, 0, 0, h_bottom, sizeof(float)*4*extent.width, sizeof(float)*4*extent.width,
extent.height,cudaMemcpyHostToDevice, stream[1]);
cudaMemcpy2DToArrayAsync(d_t, 0, 0, h_top, sizeof(float)*4*extent.width, sizeof(float)*4*extent.width,
extent.height,cudaMemcpyHostToDevice, stream[2]);
cudaMemcpy2DToArrayAsync(d_f, 0, 0, h_front, sizeof(float)*4*extent.width, sizeof(float)*4*extent.width,
extent.height,cudaMemcpyHostToDevice, stream[3]);
cudaMemcpy2DToArrayAsync(d_ba, 0, 0, h_back, sizeof(float)*4*extent.width, sizeof(float)*4*extent.width,
extent.height,cudaMemcpyHostToDevice, stream[4]);
cudaMemcpy2DToArrayAsync(d_l, 0, 0, h_left, sizeof(float)*4*extent.width, sizeof(float)*4*extent.width,
extent.height,cudaMemcpyHostToDevice, stream[5]);
cudaMemcpy2DToArrayAsync(d_r, 0, 0, h_right, sizeof(float)*4*extent.width, sizeof(float)*4*extent.width,
extent.height,cudaMemcpyHostToDevice, stream[6]);
Does channelDesc elements have to match the type ‘type’ belong when texture ref is declared:
texture<Type, Dim, ReadMode> texRef;
my texture ref is has Type as float4. Does my channelDesc structure have to have the w component filled in as 32?
Also my declaration of host array is ok when I do memcpy2DToArrayAsync with the given pitch? The spitch parameter should be the same as the width, am I correct+