Hi,
Basically, I use a float* (denoted A ) in texture memory and a float* (denoted B ) loaded in global memory. Both represent a 2D array of floating points.
I use for A a 2D cudaArray.
My kernel do a kind of matrix multiplication (just a kind of). So, I can verify the result.
When A is large (approximately >16000), the result is not precise and generally wrong. When A is <16000, there is no problem…
I think that the problem is not syntactic but I prefer to past the code relative to texture.
// Allocation of texture memory for reference points
cudaChannelFormatDesc channelDescA = cudaCreateChannelDesc<float>();
result = cudaMallocArray( &ref_array, &channelDescA, ref_width, height );
if (result){
printErrorMessage(result, ref_width*height*size_of_float);
cudaFree(query_dev);
return;
}
cudaMemcpyToArray( ref_array, 0, 0, ref_host, ref_width * height * size_of_float, cudaMemcpyHostToDevice );
// Set texture parameters and bind texture to array
texA.addressMode[0] = cudaAddressModeClamp;
texA.addressMode[1] = cudaAddressModeClamp;
texA.filterMode = cudaFilterModePoint;
texA.normalized = false;
cudaBindTextureToArray( texA, ref_array, channelDescA );
Thanks for the help.
Vince