Using Textures

Dan_Johnson · March 12, 2007, 1:57am

I am trying to map some data to texture and am having difficulty understanding the API document. I basically have a large piece of data I want stored as a 2-D texture, 1 32-bit float 1 element per coordinate. I understand that texture is good for this as I have 2D spacial locality.

I have loaded data to the device and have a pointer to this memory and want to make it a texture by using BindTexture. (is this what I am supposed to do?)

texture<float, 2, cudaReadModeElementType> image;
struct cudaChannelFormatDesc myChanFormDesc =
cudaCreateChannelDesc(32,0,0,0, cudaChannelFormatKindFloat);

cudaBindTexture(image, data, myChanFormDesc, size, 0);

^ This call is what confuses me - where I put “image” the docs say to use a char* symbol - what is this piece of information? I assume it is not something that I define arbitrarilty but I have not found an API call that assigns one.

I also need to pass the texture reference to the device to use the texture data. I get a compiler complaint when I try to pass “image” as the parameter to the kernel.

I know I am making some simple mistakes, just can’t seem to find what I need in the docs very easily…

tachyon_john · March 12, 2007, 4:07am

Here’s a simple snippet of some old test code I wrote that uses 2-D textures. Ignore everything but the texture stuff since this is a very slow version of the code.

....

// declare texture reference for the 2-D float4 texture

texture<float4, 2, cudaReadModeElementType> tex;

__global__ static void cenergy(int numatoms, float gridspacing, int z, float * energygrid) {

  int xindex  = (blockIdx.x * blockDim.x) + threadIdx.x;

  int yindex  = (blockIdx.y * blockDim.y) + threadIdx.y;

  int outaddr = (gridDim.x * blockDim.x * yindex) + xindex;

 float3 coor;

  coor.x = gridspacing * xindex;

  coor.y = gridspacing * yindex;

  coor.z = gridspacing * z;

 int atomid, tx, ty;

  float energyval=0.0f;

  for (atomid=0,tx=0,ty=0; atomid < numatoms; ty++) {

    for (tx=0; tx < TEXROWSIZE && atomid < numatoms; tx++, atomid++) {

      float4 atominfo = texfetch(tex, tx, ty);

      float dx = coor.x - atominfo.x;

      float dy = coor.y - atominfo.y;

      float dz = coor.z - atominfo.z;

      energyval += atominfo.w / sqrtf(dx*dx + dy*dy + dz*dz);

    }

  }

 energygrid[outaddr] = energyval;

}

int copyatomstotexture(float *atoms, int count, cudaArray **darray) {

printf("tex: %d %d %d %d %d\n",

  tex.channelDesc.x,

  tex.channelDesc.y,

  tex.channelDesc.z,

  tex.channelDesc.w,

  tex.channelDesc.f);

 cudaMallocArray(darray, &tex.channelDesc, TEXROWSIZE, (count / TEXROWSIZE));

 cudaMemcpyToArray(*darray, 0, 0, atoms, count*4*sizeof(float), cudaMemcpyHostToDevice);

 // set texture parameters

  tex.addressMode[0] = cudaAddressModeClamp;

  tex.addressMode[1] = cudaAddressModeClamp;

  tex.filterMode = cudaFilterModePoint;

  tex.normalized = false; // do not normalize coordinates

 // Bind the array to the texture

  cudaBindTexture(tex, *darray);

 return 0;

}

.......

jhanweck · March 12, 2007, 9:56pm

Note: The count parameter for cudaMemcpyToArray() appars to be number of bytes. The docs list it as number of array elements.

Cyril_Zeller · March 13, 2007, 2:00pm

Good catch, thanks. We’ll fix the doc.

Dan_Johnson · March 15, 2007, 7:58pm

I am still curious how I can use the other call to cudaBindTexture that does not use an array (the other one in the doc). How is this call used?

Mark_Harris · March 22, 2007, 11:02am

use cudaMalloc2D or cudaMalloc. In either case, you need to use 1D addressing in your texfetch() call and declare your texture to be 1D (only CUDA Arrays allocated with cudaMallocArray work as 2D textures). To do this when you use cudaMalloc2D, you have to use the pitch returned by cudaMalloc2D in the addressing, something like this:

texture <float> input_tex;

__global__ 

void kernel_texture(float *output_d, int pitch) 

{

    int x = blockIdx.x * blockDim.x + threadIdx.x;

    int y = blockIdx.y * blockDim.y + threadIdx.y;

   output_d[y*pitch+x] = 2 * texfetch(input_tex, y*(pitch>>2)+x);

}

That pitch>>2 is important. This is basically the pitch in elements rather than in bytes. So if your elements are floats, then they are 4 bytes, so you shift right 2 bits. But if your elements are shorts, then you need to shift right only 1 bit.

Note also that unlike CUDA Array texture references, linear memory texture references will not benefit as well from the texture cache as CUDA Arrays due to memory layout.

Mark

houzet_dominique · March 23, 2007, 10:19am

Hy

If I consider the doc, 2D textures can be of integer types. But when I change the float type with unsigned short or unsigned int in the simpleTexture example and launch it in Emurelease mode the kernel is blocking with no treatment. Even a printf(“here\n”); at the beginning of the kernel is not printed. Do I have forgotten something? Can you help me?

here it is the modified code with unsigned short:

#ifndef SIMPLETEXTURE_KERNEL_H
#define SIMPLETEXTURE_KERNEL_H

// declare texture reference for 2D float texture
texture<unsigned short, 2, cudaReadModeElementType> tex;

////////////////////////////////////////////////////////////////////////////////
//! Transform an image using texture lookups
//! @param g_odata output data in global memory
////////////////////////////////////////////////////////////////////////////////
global void
transformKernel(unsigned short* g_odata, int width, int height, float theta)
{
printf(“here\n”);
// calculate normalized texture coordinates
unsigned int x = blockIdx.xblockDim.x + threadIdx.x;
unsigned int y = blockIdx.yblockDim.y + threadIdx.y;

float u = x / (float) width;
float v = y / (float) height;

// transform coordinates
u -= 0.5;
v -= 0.5;
float tu = u*cos(theta) - v*sin(theta) + 0.5;
float tv = v*cos(theta) + u*sin(theta) + 0.5;

// read from texture and write to global memory
g_odata[y*width + x] = texfetch(tex, tu, tv);

}

#endif // #ifndef SIMPLETEXTURE_KERNEL_H

and the main code:

// includes, system
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <math.h>

// includes, project
#include <cutil.h>

// includes, kernels
#include <simpleTexture_kernel.cu>

char *image_filename = “lena_bw.pgm”;
float angle = 0.5f; // angle to rotate image by (in radians)

////////////////////////////////////////////////////////////////////////////////
// declaration, forward
void runTest( int argc, char** argv);

extern “C”
void computeGold( float* reference, float* idata, const unsigned int len);

////////////////////////////////////////////////////////////////////////////////
// Program main
////////////////////////////////////////////////////////////////////////////////
int
main( int argc, char** argv)
{
runTest( argc, argv);

CUT_EXIT(argc, argv);

}

////////////////////////////////////////////////////////////////////////////////
//! Run a simple test for CUDA
////////////////////////////////////////////////////////////////////////////////
void
runTest( int argc, char** argv)
{
// load image from disk
unsigned short* h_data = NULL;
unsigned int width, height;
char* image_path = cutFindFilePath(image_filename, argv[0]);
if (image_path == 0)
exit(EXIT_FAILURE);
CUT_SAFE_CALL( cutLoadPGMs(image_path, &h_data, &width, &height));

unsigned int size = width * height * sizeof(unsigned short);
printf("Loaded '%s', %d x %d pixels\n", image_filename, width, height);

// allocate device memory for result
unsigned short* d_data = NULL;
CUDA_SAFE_CALL( cudaMalloc( (void**) &d_data, size));

// allocate array and copy image data
cudaArray* cu_array;
CUDA_SAFE_CALL( cudaMallocArray( &cu_array, &tex.channelDesc, width, height )); 
CUDA_SAFE_CALL( cudaMemcpy( cu_array, h_data, size, cudaMemcpyHostToDevice));

// set texture parameters
tex.addressMode[0] = cudaAddressModeWrap;
tex.addressMode[1] = cudaAddressModeWrap;
tex.filterMode = cudaFilterModeLinear;
tex.normalized = true;    // access with normalized texture coordinates

// Bind the array to the texture
cudaBindTexture( tex, cu_array);

dim3 dimBlock(8, 8, 1);
dim3 dimGrid(width / dimBlock.x, height / dimBlock.y, 1);

// warmup
transformKernel<<< dimGrid, dimBlock, 0 >>>( d_data, width, height, angle);

unsigned int timer = 0;
CUT_SAFE_CALL( cutCreateTimer( &timer));
CUT_SAFE_CALL( cutStartTimer( timer));

// execute the kernel
transformKernel<<< dimGrid, dimBlock, 0 >>>( d_data, width, height, angle);

// check if kernel execution generated an error
CUT_CHECK_ERROR("Kernel execution failed");

CUT_SAFE_CALL( cutStopTimer( timer));
printf("Processing time: %f (ms)\n", cutGetTimerValue( timer));
printf("%.2f Mpixels/sec\n", (width*height / (cutGetTimerValue( timer) / 1000.0f)) / 1e6);
CUT_SAFE_CALL( cutDeleteTimer( timer));

// allocate mem for the result on host side
unsigned short* h_odata = (unsigned short*) malloc( size);
// copy result from device to host
CUDA_SAFE_CALL( cudaMemcpy( h_odata, d_data, size, cudaMemcpyDeviceToHost) );

// write result to file
char output_filename[1024];
strcpy(output_filename, image_path);
strcpy(output_filename + strlen(image_path) - 4, "_out.pgm");
CUT_SAFE_CALL( cutSavePGMs( output_filename, h_odata, width, height));
printf("Wrote '%s'\n", output_filename);

// write regression file if necessary
if( cutCheckCmdLineFlag( argc, (const char**) argv, "regression")) 
{
    // write file for regression test
  //  CUT_SAFE_CALL( cutWriteFilef( "./data/regression.dat", h_odata, width*height, 0.0));

}

// cleanup memory
CUDA_SAFE_CALL(cudaFree(d_data));
CUDA_SAFE_CALL(cudaFreeArray(cu_array));
free(h_data);
free(h_odata);
cutFree(image_path);

}

thanks for help!

jhanweck · March 23, 2007, 7:35pm

Have you tried cudaMemcpyToArray() instead of cudaMemcpy()?

Odd – the SDK’s sample code (simpleTexture.cu) uses cudaMemcpy() to copy host data to an array, instead of cudaMemcpyToArray()…

Is that allowed? or recommended? :unsure:

P.S. I for one am still unclear on the usage subtleties of cudaMemcpy() vs. cudaMemcpy2D() vs. cudaMemcpyToArray() vs. cudaMemcpy2DToArray(). The documentation and SDK sample code is pretty terse (and sometimes wrong!) on using these for initializing and using textures. A primer with a few examples (a sticky thread from one of the CUDA development team?) would go a long way to clarifying the topic. Thanks in advance!

houzet_dominique · March 25, 2007, 10:27am

I’ve just tried cudaMemcpyToArray(), the result is the same as cudaMemcpy (works with float but not with unsigned short).

I’ve also tried on the board (release mode) and it’s the same.

Do somebody has an example working fine with a texture array of unsigned short or int ?

:(

jhanweck · March 29, 2007, 3:24pm

I inserted the line:

// check if kernel execution generated an error

CUT_CHECK_ERROR("Kernel execution failed");

after the call to transformKernel(), and compiled with with -D_DEBUG.

I strongly recommend checking for errors after every CUDA call, and compiling with -D_DEBUG. You’ll catch a lot of problems that way.

The resulting error is:

Cuda error: Kernel execution failed in file ‘cuTestHouzet.cu’ in line 110 : linear filtering not supported for non-float type.

This is mentioned on p. 24 of the CUDA Programming Guide (v0.8): “cudaFitlerModeLinear is only valid for returned values of floating-point type.”

houzet_dominique · March 29, 2007, 4:04pm

I inserted the line:
// check if kernel execution generated an error

CUT_CHECK_ERROR("Kernel execution failed");
after the call to transformKernel(), and compiled with with -D_DEBUG.

I strongly recommend checking for errors after every CUDA call, and compiling with -D_DEBUG. You’ll catch a lot of problems that way.

The resulting error is:

Cuda error: Kernel execution failed in file ‘cuTestHouzet.cu’ in line 110 : linear filtering not supported for non-float type.

This is mentioned on p. 24 of the CUDA Programming Guide (v0.8): “cudaFitlerModeLinear is only valid for returned values of floating-point type.”

[snapback]177387[/snapback]

Thanks very much. As I see, error messages are really useful!

Dominique

Topic		Replies	Views
Using texture memory over iterations causes incorrect read/write of some lines CUDA Programming and Performance cuda	2	516	September 2, 2020
Reading R8G8B8A8 texture using tex2D() causes strange result. CUDA Programming and Performance	27	2889	April 28, 2018
Simplest texture 2D examples CUDA Programming and Performance	11	11483	March 26, 2019
Second call to cudaMemcpy3D crashes my app First call is successful CUDA Programming and Performance	14	8807	November 30, 2010
Textures: Please provide some feedback CUDA Programming and Performance	2	1162	February 24, 2013
Texture Cache Startup Issue Simple Texture Cache Starter example CUDA Programming and Performance	8	4016	March 17, 2010
How to define texture properly CUDA Programming and Performance	10	6291	November 5, 2007
2D Texture access How can I access pixels from 2D texture CUDA Programming and Performance	24	18473	June 5, 2010
Output of 2D texture memory is zero CUDA Programming and Performance	9	993	March 30, 2021
Simple texture problem Code will not compile. CUDA Programming and Performance	8	4141	February 4, 2010

Using Textures

Related topics