Forced to use high-dim textures due to tex1D size limits, linear indexing still possible?

Kasper_Marstal · May 30, 2010, 5:39pm

Hello,

I’m doing medical image processing and would like to apply neighborhood-based operations to a 3D image. I’ve got a 3D volume that I would like to access using linear indices (that is computed in a higher level language). In my algorithm, the voxel values are frequently used and so I would to access them using textures. At first, I allocated the image using a one-dimensional texture but was forced to abandon this approach due to tex1D size limits. I can’t really use tex3D because it doesn’t handle linear indices. I imagine I’m not the only one facing this problem. Could anyone advice me on possible solutions? What have you worked out to overcome this?

Nighthawk13 · May 30, 2010, 8:27pm

Tex1D should allow indices up to 2^27-1 when bound to linear memory, but only 8191 when bound to a cuda array.

So switch to linear mem if you have not yet.

When 2^27 is too small, you could load often used data into shared mem from global mem instead of using texture access.

Or use 2D textures, as you said.

Kasper_Marstal · May 30, 2010, 10:45pm

Thank you nighthawk! I tried to use linear memoy. However, I can’t make it work right. The following are relevant lines of my program…

texture<float, 1, cudaReadModeElementType> texD;

  float* d_D;

  size_t numel_D_in_bytes =  siz_D[0] * siz_D[1] * siz_D[2] * sizeof(float);

// Allocate MR image

  isError = cudaMalloc(&d_D, numel_D_in_bytes);

  checkError(isError, numel_D_in_bytes, memory_free, memory_total, "d_D ALLOC");

// Set copy parameters

  isError = cudaMemcpy(d_D, h_D, numel_D_in_bytes, cudaMemcpyHostToDevice);

  checkError(isError, numel_D_in_bytes, memory_free, memory_total, "d_D MEMCPY");

// Bind texture to MR image array and set texture parameters 

  isError = cudaBindTexture(0, texD, d_D, texD.channelDesc, numel_D_in_bytes);

  checkError(isError, numel_D_in_bytes, memory_free, memory_total, "d_D BIND TEXURE");

  texD.normalized	 = false;			   

  texD.addressMode[0] = cudaAddressModeClamp;

  texD.addressMode[1] = cudaAddressModeClamp;

To try it out, I made a kernel like

if(M < 1 && N < 1){

  d_SIMAPS[0] = tex1D(texD, 1);

  }

but nothing happens, I get the default 0 at d_SIMAPS[0]. On the other hand,

if(M < 1 && N < 1){

  d_SIMAPS[0] = 1;

  }

returns what is expected so something is wrong with the texure. What am I doing wrong?

pium · May 31, 2010, 7:02am

In case of linear memory, I think you should use tex1Dfetch instead of tex1D.

Kasper_Marstal · May 31, 2010, 4:07pm

Thanks pium! Solved!