Problem with large texture


Basically, I use a float* (denoted A ) in texture memory and a float* (denoted B ) loaded in global memory. Both represent a 2D array of floating points.

I use for A a 2D cudaArray.

My kernel do a kind of matrix multiplication (just a kind of). So, I can verify the result.

When A is large (approximately >16000), the result is not precise and generally wrong. When A is <16000, there is no problem…

I think that the problem is not syntactic but I prefer to past the code relative to texture.

// Allocation of texture memory for reference points


    cudaChannelFormatDesc channelDescA = cudaCreateChannelDesc<float>();

    result = cudaMallocArray( &ref_array, &channelDescA, ref_width, height );

    if (result){

        printErrorMessage(result, ref_width*height*size_of_float);




    cudaMemcpyToArray( ref_array, 0, 0, ref_host, ref_width * height * size_of_float, cudaMemcpyHostToDevice );



    // Set texture parameters and bind texture to array

    texA.addressMode[0] = cudaAddressModeClamp;

    texA.addressMode[1] = cudaAddressModeClamp;

    texA.filterMode     = cudaFilterModePoint;

    texA.normalized     = false;

    cudaBindTextureToArray( texA, ref_array, channelDescA );

Thanks for the help.


See Appendix A of the programming guide for the maximum texture dimensions, you are exceeding them. You don’t appear to be checking for errors from cudaBindTextureToArray which should report this error.

According to the Appendix A :

“For a texture reference bound to a one-dimensional CUDA array, the maximum width is 2^13 (8192)”

“For a texture reference bound to a two-dimensional CUDA array, the maximum width is 2^16 (65536) and the maximum height is 2^15 (32768)”

In my case, the cudaArray width is 16000 or 17000 and its height is 5. The texture is defined as 2D texture :

texture<float, 2, cudaReadModeElementType> texA;

I think I haven’t exceded these values… But maybe my texture is not well defined.

CUDA doesn’t threw an error.

I’m sorry, for some reason I read your “16000” as “64000” and assumed it was the limits. Try modifying the kernel to directly copy values read from the texture to the output to see if there is a problem there.

This is exactly was I’m doing. I’ll post a short code source in few hours.

16000*4 = 64000 bytes. Texture width is in integers Or bytes?