Problems when reading from 2D Texture

k3rp · February 8, 2011, 2:22pm

Hello guys, need some help with a (hopefully trivial) problem.

I want to read values from a 2D Texture and updating them by using a device pointer. I use the same approach as in the book “CUDA by example” to achieve this.

float* device_ptr;

texture<float,2> tex;

int main()

{

	const int DIM = 4;

	int size = DIM * DIM * sizeof(float);

	cudaMalloc( (void**)&device_ptr , size );

	cudaChannelFormatDesc desc = cudaCreateChannelDesc<float>();

	cudaBindTexture2D( NULL , tex , device_ptr , desc , DIM , DIM , sizeof(float) * DIM ) ;

	float temp[16];

	for (int y = 0; y < DIM ; y++)

		for (int x = 0; x < DIM ; x++)

			temp[ x + y * DIM ] = x + y * DIM;

	cudaMemcpy( device_ptr , temp , size , cudaMemcpyHostToDevice );

	dim3 grids(DIM/2,DIM/2);

	dim3 threads(2,2);

	kernel<<<grids,threads>>>(device_ptr);

	cudaUnbindTexture( tex );

	cudaFree( device_ptr );

	return 0;

}

The problem is that somehow the 2D texture doesn’t seem to work properly for me. Instead of getting a 4x4 Texture, I only seem to get a 1x4 Texture. When I try to fetch values from row 2 3 4, it seems I get clamped values from row 1.

I used cuPrintf (to write values in the console) together with following kernel:

__global__ void kernel (float* device_ptr)

{

	int x = threadIdx.x + blockIdx.x * blockDim.x;

	int y = threadIdx.y + blockIdx.y * blockDim.y;

	int offset = x + y * blockDim.x * gridDim.x;

	cuPrintf("Texture value at x:%i y:%i  =  %f\n" , x , y, tex2D( tex , x , y ) );

	cuPrintf("device_ptr value at x:%i y:%i  =  %f \n\n" , x , y , device_ptr[offset] );

}

The output:

[0, 0]: Texture value at x:0 y:0  =  0.000000

[0, 0]: device_ptr value at x:0 y:0  =  0.000000

[0, 1]: Texture value at x:1 y:0  =  1.000000

[0, 1]: device_ptr value at x:1 y:0  =  1.000000

[0, 2]: Texture value at x:0 y:1  =  0.000000

[0, 2]: device_ptr value at x:0 y:1  =  4.000000

[0, 3]: Texture value at x:1 y:1  =  1.000000

[0, 3]: device_ptr value at x:1 y:1  =  5.000000

[1, 0]: Texture value at x:2 y:0  =  2.000000

[1, 0]: device_ptr value at x:2 y:0  =  2.000000

[1, 1]: Texture value at x:3 y:0  =  3.000000

[1, 1]: device_ptr value at x:3 y:0  =  3.000000

[1, 2]: Texture value at x:2 y:1  =  2.000000

[1, 2]: device_ptr value at x:2 y:1  =  6.000000

[1, 3]: Texture value at x:3 y:1  =  3.000000

[1, 3]: device_ptr value at x:3 y:1  =  7.000000

[2, 0]: Texture value at x:0 y:2  =  0.000000

[2, 0]: device_ptr value at x:0 y:2  =  8.000000

[2, 1]: Texture value at x:1 y:2  =  1.000000

[2, 1]: device_ptr value at x:1 y:2  =  9.000000

[2, 2]: Texture value at x:0 y:3  =  0.000000

[2, 2]: device_ptr value at x:0 y:3  =  12.000000

[2, 3]: Texture value at x:1 y:3  =  1.000000

[2, 3]: device_ptr value at x:1 y:3  =  13.000000

[3, 0]: Texture value at x:2 y:2  =  2.000000

[3, 0]: device_ptr value at x:2 y:2  =  10.000000

[3, 1]: Texture value at x:3 y:2  =  3.000000

[3, 1]: device_ptr value at x:3 y:2  =  11.000000

[3, 2]: Texture value at x:2 y:3  =  2.000000

[3, 2]: device_ptr value at x:2 y:3  =  14.000000

[3, 3]: Texture value at x:3 y:3  =  3.000000

[3, 3]: device_ptr value at x:3 y:3  =  15.000000

So, did I miss anything?

k3rp · February 8, 2011, 3:33pm

Hmm!

If I change

const int DIM = 4;

to

const int DIM = 16;

OR

const int DIM = 32;

OR 

const int DIM = 64;

etc

… it works. Only worked with DIM = 2^x, x > 3 (i.e. 25,26,27 didnt work)

Christopher · February 9, 2011, 3:14pm

IÂ´m having the same problem, hopefully someone will solve this soon… External Image

pium · February 9, 2011, 4:05pm

Hi,

have you checked cudaMalloc2D and cudaMallocPitch?

–pium

Christopher · February 10, 2011, 7:25am

I have read about them but didn’t try to use them. I read about 2d textures in the Cuda by example book and they do exactly as kerp describes.

But iÂ´m am wondering about one thing, the pitch. In the exampels in the book they set the pitch as sizeof(float)*width. That makes me think that the pitch is the size in bytes of each row.

In cudaMalloc2D u have to pass both pitch off the dst and src so what should be passed there?

Api ref on malloc2d

//Chris

pium · February 11, 2011, 8:37am

cudaMallocPitch() returns you the pitch. The “memory width” is at less the with of your table but generally it is bigger to respect hardware restriction on memory aligment (certainly something around power of 2).
If you are copying from the host, certainly the src pitch is the width (in bytes).

Note I am not familiar with these functions but it is what I understand by reading the doc.

njuffa · February 12, 2011, 4:24am

I am not an expert on textures, but to my knowledge, due to layout restrictions, 2D textures cannot generally be bound to linear memory. Either use pitch-linear memory allocated via cudaMallocPitch() or a cudaArray allocated via cudaMallocArray(). Below is a modified app that uses pitch-linear memory. Note that I have changed the numbering of the matrix elements versus the original app, for ease of testing on my side. I am using device-side printf(), which requires a sm_2x platform; simply change back to cuPrintf() if needed.

#include <stdio.h>

#include <stdlib.h>

#define DIM 4

#if (DIM % 2)

#error DIM must be a multiple of 2

#endif

// Macro to catch CUDA errors in CUDA runtime calls

#define CUDA_SAFE_CALL(call)                                          \

do {                                                                  \

    cudaError_t err = call;                                           \

    if (cudaSuccess != err) {                                         \

        fprintf (stderr, "Cuda error in file '%s' in line %i : %s.\n",\

                 __FILE__, __LINE__, cudaGetErrorString(err) );       \

        exit(EXIT_FAILURE);                                           \

    }                                                                 \

} while (0)

texture<float,2> tex;

__global__ void kernel (float* device_ptr, int pitch)

{

    int x = threadIdx.x + blockIdx.x * blockDim.x;

    int y = threadIdx.y + blockIdx.y * blockDim.y;

    int offset = x + y * (pitch/sizeof(float));

    printf ("Texture value at x:%i y:%i  =  %f\n" , 

            x , y, tex2D( tex , x+0.5f , y+0.5f ) );   

    printf ("device_ptr value at x:%i y:%i  =  %f\n" , 

            x , y , device_ptr[offset] );

}

int main()

{

    float temp[DIM*DIM];

    size_t tex_ofs = 0;

    size_t pitch = 0;

    float *device_ptr = 0;

    CUDA_SAFE_CALL (cudaMallocPitch ((void**)&device_ptr, &pitch, DIM, DIM));

    CUDA_SAFE_CALL (cudaMemset (device_ptr, 0xff, DIM*pitch));

    for (int y = 0; y < DIM ; y++) {

        for (int x = 0; x < DIM ; x++) {

            temp[ x + y * DIM] = y + x * DIM;

        }

    }

    CUDA_SAFE_CALL (cudaMemcpy2D (device_ptr,

                                  pitch,

                                  temp,

                                  DIM*sizeof(float),

                                  DIM*sizeof(float),

                                  DIM,

                                  cudaMemcpyHostToDevice));

    CUDA_SAFE_CALL (cudaBindTexture2D (&tex_ofs,

                                       &tex,

                                       device_ptr,

                                       &tex.channelDesc,

                                       DIM,

                                       DIM, 

                                       pitch));

    if (tex_ofs != 0) {

        printf ("texture offset is not 0\n");

        exit(EXIT_FAILURE);

    }

    dim3 grids(DIM/2,DIM/2);

    dim3 threads(2,2);

    kernel<<<grids,threads>>>(device_ptr, pitch);

    CUDA_SAFE_CALL (cudaUnbindTexture (tex));

    CUDA_SAFE_CALL (cudaFree (device_ptr));

    return 0;

}

k3rp · March 5, 2011, 9:38am

Hey again!

I played around some with your code this morning and found out that it didn’t work for dimensions greater than 2^9 (returning invalid argument in the safe call macro).

I think the error is in the cudaMallocPitch, so just changed

cudaMallocPitch ((void**)&device_ptr, &pitch, DIM, DIM);

to

cudaMallocPitch ((void**)&device_ptr, &pitch, DIM*sizeof(float), DIM);

Lazy as I am, I just googled the cudaMallocPitch function to get a link to the api and got this link that says nothing about width (in bytes)

http://developer.download.nvidia.com/compute/cuda/2_3/toolkit/docs/online/group__CUDART__MEMORY_g80d689bc903792f906e49be4a0b6d8db.html

and did obviously not notice the /cuda/2_3/ in the url. In 3.2 documentation, width in bytes is mentioned!

http://developer.download.nvidia.com/compute/cuda/3_2/toolkit/docs/online/group__CUDART__MEMORY_g80d689bc903792f906e49be4a0b6d8db.html

This also works for arbitrary dimensions, i.e:

cudaMallocPitch ((void**)&device_ptr, &pitch, DIM_X*sizeof(float), DIM_Y);

Topic		Replies	Views
Matlab crashes when I use 3d texture in CUDA code CUDA Programming and Performance	4	1270	January 10, 2011
2D textures bound to Pitch Linear Memory texture doesn't access correct information CUDA Programming and Performance	2	20211	December 21, 2010
Output of 2D texture memory is zero CUDA Programming and Performance	9	990	March 30, 2021
Getting an Error Using CudaMalloc3d CUDA Programming and Performance	10	2490	December 10, 2015
Using Textures CUDA Programming and Performance	10	21821	March 29, 2007
Simplest texture 2D examples CUDA Programming and Performance	11	11368	March 26, 2019
About cudaBindTexture2D CUDA Programming and Performance	3	6343	March 31, 2009
Undefined tex1Dfetch in kernel CUDA Programming and Performance	7	2800	March 11, 2021
Textures Anything wrong with this CUDA Programming and Performance	0	12936	November 28, 2010
How to define texture properly CUDA Programming and Performance	10	6290	November 5, 2007

Problems when reading from 2D Texture

Related topics