runtime error: cudaGetChannelDesc

I am getting a runtime error from the following function:
(in cuda_runtime.h)
template<class T, int dim, enum cudaTextureReadMode readMode>
static inline host cudaError_t cudaBindTextureToArray(
const struct texture<T, dim, readMode> &tex,
const struct cudaArray *array
)
{
struct cudaChannelFormatDesc desc;
cudaError_t err = cudaGetChannelDesc(&desc, array);

return err == cudaSuccess ? cudaBindTextureToArray(tex, array, desc) : err;
}

my code looks like this:

texture<float, 2, cudaReadModeElementType> texRef;(declared as a global)

cudaArray* cuArray;
cudaMallocArray(&cuArray, &texRef.channelDesc, width, height);
cudaBindTextureToArray (texRef, originalImage);

My block dimension is (16 x 16),
grid dimension is ( width/16, height/16)

I do not get this error when width = height = 8192, but i get it when i increase the width and height to 16384

Can someone please tell me what could be the problem?

Try to compile with the -cubin option and look at the .cubin file. I think you are using to much registers, because your talking about 8192 border. take a look at the programming guide Appendix A.1

Cite from CUDA Programming guide Section A.1:

For a texture reference bound to linear memory, the maximum width is 2^27

8192 is 2^13 so 81928192 is 2^26. But 16384 is 2^14 and so 1638416384 is 2^28.

Section A.1 of programming guide says:

for a texture reference bound to a 2D CUDA array, the max width is 2^16 and max height is 2^15

I am using 2D CUDA arrays, not linear memory for texturing. So, i am not exceeding the bounds.

16384163844 / 1024^3 = 1. You are allocating 1GiB of memory. You don’t mention your hardware, but this can only be possible if you have Tesla or a Quadro.

You are right, I’m using GEForce 8800 GTS which has only .67 GB of memory.

Thank You Mr. Anderson, now I atleast know that there is nothing wrong with my programming :)