cudaBindTexture2D and offset

philipjfry · March 10, 2011, 6:08pm

CUDA Toolkit Reference Manual 3.2 says about function template cudaBindTexture2D (4.18.2.5):

I have no clue how I should use the alignment with a tex2D call. I think I knew how to combine this with tex1Dfetch for a 1D texture, but there is no tex2Dfetch. I know that I would not run into this problem if data is aligned “properly”, but I would like to handle a case where this is just not the case…

If this is a re-post, I’m sorry. I did a search in the forum and read the post for every promising topic without luck.

njuffa · March 11, 2011, 1:56am

Basically what the texture binding does is snap the start of the texture to the next lower texture-aligned address, and returns the difference in bytes between that address and the starting address of the data we want to map. For the 1-D case, this means we need to divide the byte offset by the texel size, and add this to the texture index when accessing the texture. Also, we need to make sure the texture is lengthened by the same number of texels, to make sure the texture covers all the data we want mapped.

The two-dimensional case works analogous. By snapping the start of the texture to the next lower texture-aligned address, we are basically moving the starting column of that texture by (texture offset / texel size) texel. This means we need to add the texel offset to the column portio of our two-dimensional index when accessing the texture. Also, we need to make sure the texture width is increased by the same number of texels, to make sure the texture covers all the data we want mapped.

In the following ASCII-art graphs below (hope they survive) the thinly outlined outer boxes represent texture-aligned chunks of bytes, e.g. 512 bytes on Fermi-class device. The numbers in the corner of these boxes show the address order of these chunks. The boxes with the thinker outlines represent unaligned data portions that we want to bind to a texture.

1D case

=======

//

            <~~~~~~~~~~~ n ~~~~~~~~~~~>

+--------------+--------------+--------------+--------------+

:           +=========================+      :              :

:-- ofs --->|  data to be mapped      |      :              :

: 1         +=========================+    3 :            4 :

+--------------+--------------+--------------+--------------+

            '                         '

index       0                       n-1

//

<~~~~~~~~~~~~~~ n + ofs ~~~~~~~~~~~~~~>                            

+--------------+--------------+--------------+--------------+

+=====================================+      :              :

|             mapped data             |      :              :

+=====================================+    3 :            4 :

+--------------+--------------+--------------+--------------+

'           '                         '

0           ofs                 n+ofs-1

//

//

2D case

=======

//

                            <~~~~~~~~~~~ n ~~~~~~~~~~~>

+--------------+--------------+--------------+--------------+

:              :              :              :              :

:              :              :              :              :

: 0            : 1            : 2            :            3 :

+--------------+--------------+--------------+--------------+   row

:              :           +=========================+      : __0

:              :-- ofs --->|                         |      :

: 4            : 5         |       data              |    7 :

+--------------+-----------|                         |------+

:              :           |                         |      :

:              :           |       to be             |      :

: 8            : 9         |                         |   11 :

+--------------+-----------|                         |------+

:              :           |       mapped            |      :

:              :           |                         |      :

: 12           : 13        +=========================+   15 : __m-1 

+--------------+--------------+--------------+--------------+

                           '                         '

column                     0                       n-1

//

               <~~~~~~~~~~~~~~ n + ofs ~~~~~~~~~~~~~~>            

+--------------+--------------+--------------+--------------+

:              :              :              :              :

:              :              :              :              :

: 0            : 1            : 2            :            3 :

+--------------+--------------+--------------+--------------+   row

:              +=====================================+      : __0 

:              |                                     |      :

: 4            |         mapped                      |    7 :

+--------------+                                     |------+

:              |                                     |      :

:              |         data                        |      :

: 8            |                                     |   11 :

+--------------+                                     |------+

:              |                                     |      :

:              |                                     |      :

: 12           +=====================================+   15 : __m-1

+--------------+--------------+--------------+--------------+

               '           '                         '

column         0           ofs                 n+ofs-1

The following code demonstrates binding of a small 4x3 texture at all possible offsets in a 7x10 matrix. Each matrix element contains its row and column number (encoded as row * 100 + column), and the kernel prints the value of all matrix elements mapped by the texture.

#include <stdio.h>

#include <stdlib.h>

#define ARR_WIDTH   10

#define ARR_HEIGHT  7

#define TEX_WIDTH   3

#define TEX_HEIGHT  4

// Macro to catch CUDA errors in CUDA runtime calls

#define CUDA_SAFE_CALL(call)                                          \

do {                                                                  \

    cudaError_t err = call;                                           \

    if (cudaSuccess != err) {                                         \

        fprintf (stderr, "Cuda error in file '%s' in line %i : %s.\n",\

                 __FILE__, __LINE__, cudaGetErrorString(err) );       \

        exit(EXIT_FAILURE);                                           \

    }                                                                 \

} while (0)

texture<float,2> tex;

__global__ void kernel (int colOfs)

{

    int x = threadIdx.x;

    int y = threadIdx.y;

    printf ("Texture value at y:%i x:%i = %03.0f\n", 

            y, x, tex2D (tex , x + colOfs + 0.5 , y + 0.5f));   

}

int main()

{

    dim3 grids(1,1);

    dim3 threads(TEX_WIDTH, TEX_HEIGHT);

    float matrix[ARR_WIDTH * ARR_HEIGHT];

    size_t texOfs = 0;

    size_t pitch = 0;

    float *devPtr = 0;

    int elemPitch;

    int colOfs = 0;

    int posX = 0;

    int posY = 0;

CUDA_SAFE_CALL (cudaMallocPitch ((void**)&devPtr, &pitch, 

                                     ARR_WIDTH*sizeof(matrix[0]), 

                                     ARR_HEIGHT));

    CUDA_SAFE_CALL (cudaMemset (devPtr, 0xff, pitch * ARR_HEIGHT));

    for (int y = 0; y < ARR_HEIGHT ; y++) {

        for (int x = 0; x < ARR_WIDTH ; x++) {

            matrix [y * ARR_WIDTH + x] = y * 100 + x;

        }

    }

    CUDA_SAFE_CALL (cudaMemcpy2D (devPtr,

                                  pitch,

                                  matrix,

                                  ARR_WIDTH*sizeof(matrix[0]),

                                  ARR_WIDTH*sizeof(matrix[0]),

                                  ARR_HEIGHT,

                                  cudaMemcpyHostToDevice));

    elemPitch = pitch / sizeof(matrix[0]);

for (posX = 0; posX <= ARR_WIDTH-TEX_WIDTH; posX++) {

        for (posY = 0; posY <= ARR_HEIGHT-TEX_HEIGHT; posY++) {

            printf ("\ntexture anchored at x=%d,y=%d\n", posX, posY);

            CUDA_SAFE_CALL (cudaBindTexture2D (&texOfs,

                                               &tex,

                                               &devPtr[posY*elemPitch+posX],

                                               &tex.channelDesc,

                                               TEX_WIDTH,

                                               TEX_HEIGHT, 

                                               pitch));

            if (texOfs != 0) {

                printf ("binding misaligned, rebinding a wider texture\n");

                colOfs = (int)(texOfs / sizeof(matrix[0]));

                CUDA_SAFE_CALL (cudaUnbindTexture(tex));

                CUDA_SAFE_CALL (cudaBindTexture2D(&texOfs,

                                                  &tex,

                                                  &devPtr[posY*elemPitch+posX],

                                                  &tex.channelDesc,

                                                  TEX_WIDTH + colOfs,

                                                  TEX_HEIGHT, 

                                                  pitch));

            }

            kernel<<<grids,threads>>>(colOfs);

            CUDA_SAFE_CALL (cudaUnbindTexture (tex));

            CUDA_SAFE_CALL (cudaDeviceSynchronize());

        }

    }

CUDA_SAFE_CALL (cudaFree (devPtr));

    return EXIT_SUCCESS;

}

philipjfry · March 11, 2011, 2:04pm

Thank you very much for the explanation!

Topic		Replies	Views
Pitched Texture delivers wrong results? CUDA Programming and Performance	5	876	November 11, 2010
Problem with arbitrary-sized textures Error "invalid argument" with cudaBindTexture2D() CUDA Programming and Performance	1	9905	October 28, 2009
About cudaBindTexture2D CUDA Programming and Performance	3	6347	March 31, 2009
Simplest texture 2D examples CUDA Programming and Performance	11	11508	March 26, 2019
How can I use the "offset" returned by cudaBindTexture() in texture fetching? cudaBindText CUDA Programming and Performance	3	2826	April 12, 2012
binding texture with linear memory CUDA Programming and Performance	1	4301	April 21, 2007
CUDA texture object with linear memory seems not to be updated when fetching CUDA Programming and Performance cuda	4	291	June 17, 2024
Simple texture problem Code will not compile. CUDA Programming and Performance	8	4144	February 4, 2010
Using Textures CUDA Programming and Performance	10	21836	March 29, 2007
How to define texture properly CUDA Programming and Performance	10	6293	November 5, 2007

cudaBindTexture2D and offset

Related topics