Setting up for tex1D() How to load CUDA Array for tex1D()

Alec · November 19, 2007, 4:03pm

I want to use tex1D () to interpolate between adjacent entries in a constant coefficient table. I understand that this must be done from a CUDA Array. But I don’t see how to transfer data to a 1D array so thet tex1D can look it up.

I set up a 1D tecture:
texture<float, 1, cudaReadModeElementType> filterTex ;

Later, I tweak it to do the linear interpolation:
filterTex.filterMode = cudaFilterModeLinear ;
I don’t see how I can do this at compile time, so I presume I have to do it at run time.

I set up my format description
cudaChannelFormatDesc formatDesc = cudaCreateChannelDesc () ;

I create a CUDA Array and bind it to my Texture
cudaArray* cuArray ;
cudaCall (cudaMallocArray (&cuArray, &formatDesc, WIDTH, 1)) ;
cudaCall (cudaBindTextureToArray (filterTex, cuArray)) ;

Now I want to copy my coefficients into the array. But the only way I can see to do this is cudaMemcpyToArray or cudaMemcpy2DToArray(), both of which appear to be 2-D operations. If I try to “fool” them by giving a height of 1, they give me an Invalid parameter error.

I am obviously missing something here. Any suggestions?

Alec

MisterAnderson42 · November 19, 2007, 4:30pm

It’s been a while since I did this (so apologies in advance if I don’t remember correctly), but I think you need to set the height to 0.

Alec · November 19, 2007, 4:44pm

That moved me on one stage: it got rid of the invalid parameter for a short while. Infortunately, I had got the parameter count wrong: it was in floats, not bytes. When I corrected this, the Invalid Parameter came back. When it was “working”, the result was mathematically far from right - a constant value whatever I put in.

MisterAnderson42 · November 19, 2007, 7:56pm

I was wrong about the height=0 thing. That was for a very old version of CUDA.

A working code example is worth a thousand words:

#include <stdio.h>

#define CUDA_SAFE_CALL( call) do {                                         \

    cudaError err = call;                                                    \

    if( cudaSuccess != err) {                                                \

        fprintf(stderr, "Cuda error in file '%s' in line %i : %s.\n",        \

                __FILE__, __LINE__, cudaGetErrorString( err) );              \

    exit(EXIT_FAILURE);                                                      \

    } } while (0)

#define CUT_CHECK_ERROR(errorMessage) do {                                 \

    cudaThreadSynchronize();                                                \

    cudaError_t err = cudaGetLastError();                                    \

    if( cudaSuccess != err) {                                                \

        fprintf(stderr, "Cuda error: %s in file '%s' in line %i : %s.\n",    \

                errorMessage, __FILE__, __LINE__, cudaGetErrorString( err) );\

        exit(EXIT_FAILURE);                                                  \

    } } while (0)

texture<float, 1, cudaReadModeElementType> tex;

__global__ void kernel(int M, float *d_out)

    {

    float v = float(threadIdx.x) / float(blockDim.x) * float(M);

   float x = tex1D(tex, v);

    //printf("%f\n", x); // for deviceemu testing

    d_out[threadIdx.x] = x;

    }

int main()

    {

    int N = 256;

    // memory for output

    float *d_out;

    CUDA_SAFE_CALL( cudaMalloc((void**)&d_out, sizeof(float) * N) );

   int M = N/2;

    // make an array half the size of the output

    cudaArray* cuArray;

    CUDA_SAFE_CALL (cudaMallocArray (&cuArray, &tex.channelDesc, M, 1));

    CUDA_SAFE_CALL (cudaBindTextureToArray (tex, cuArray));

   tex.filterMode = cudaFilterModeLinear;

   // data fill array with increasing values

    float *data = (float*)malloc(M*sizeof(float));

    for (int i = 0; i < M; i++)

        data[i] = float(i);

    CUDA_SAFE_CALL( cudaMemcpyToArray(cuArray, 0, 0, data, sizeof(float)*M, cudaMemcpyHostToDevice) );

   kernel<<<1, N>>>(M, d_out);

   float *h_out = (float*)malloc(sizeof(float)*N);

    CUDA_SAFE_CALL( cudaMemcpy(h_out, d_out, sizeof(float)*N, cudaMemcpyDeviceToHost) );

    for (int i = 0; i < N; i++)

        printf("%f\n", h_out[i]);

   free(h_out);

    free(data);

    cudaFreeArray(cuArray);

    cudaFree(d_out);

    }

Running:

nvcc -o test_tex test_tex.cu

./test_tex

Gives me the expected output: an array that counts up to 127 in steps of 0.5 from the linear interpolation.

Alec · November 20, 2007, 11:26am

Thanks very much. As you say, an example is worth a thousand words. With a working example in front of me I was able to find several rather stupid mistakes, and I now have it working. The problem is tha tyou have to get everythign right before it works - which it now does.Thank you very much for your help.

Alec

MisterAnderson42 · November 20, 2007, 1:59pm

No problem.