cudaMalloc() Segfault? attempting to allocate large 1D char tex

I’m trying to bind a 1D signed char texture:

texture<char, 1, cudaReadModeElementType> referenceTex;

to a chunk of memory on the host side:

referenceTex.addressMode[0] = cudaAddressModeClamp;
referenceTex.addressMode[1] = cudaAddressModeClamp;
referenceTex.filterMode = cudaFilterModePoint;
referenceTex.normalized = false; // access with normalized texture coordinates

CUDA_SAFE_CALL( cudaMalloc( &cu_refarray, cudaCreateChannelDesc(), reflen));
cudaBindTexture( referenceTex, cu_refarray);
CUDA_SAFE_CALL( cudaMemcpy( cu_refarray, refstr, reflen, cudaMemcpyHostToDevice));

And I’m getting a segfault on the cudaMalloc. This happens even when reflen relatively small, like 4096. It succeeds when reflen is extremely small (I tried reflen=10, and this worked). I am allocating large 2D textures of ulong4s, no problem.

Also, storing this string in global memory (using cudaMalloc(d_refarray, reflen)) works fine.

Anyone have any ideas about what could be causing the failure?