Problem with 1D Texture and cudaAddressModeBorder

I am trying to use a 1D texture as a lookup table. I am setting cudaAddressModeBorder so that values out of range return 0, but that is not what I am getting. What am I doing wrong?

Here is a simple version of the code:

#include

float2 *lookUpTableData;
cudaArray *lookUpTable;

texture<float2, cudaTextureType1D, cudaReadModeElementType> texRef;

global void InitializeLookupTables(float2 *lookUpTableData)
{
int position = blockIdx.x * blockDim.x + threadIdx.x;
lookUpTableData[position].x = 2.0f * threadIdx.x;
lookUpTableData[position].y = 4.0f * threadIdx.x;
return;
}

global void TestLookupTables()
{
float2 lookUpTableValues;
for (int i=0; i<15; i++) {
lookUpTableValues = tex1D(texRef, 5.0f * i + 0.5f);
printf(“The value returned for value %d is %f.\n”, 5*i, lookUpTableValues.x);
}
}

int main() {
size_t lookUpTable_width = 51;
size_t lookUpTable_size = size_t (sizeof(float2) * lookUpTable_width);
cudaMalloc((void**) &lookUpTableData, lookUpTable_size);

cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc<float2>();
cudaExtent length = make_cudaExtent(lookUpTable_width, 0, 0);
cudaMalloc3DArray(&lookUpTable, &channelDesc, length, cudaArrayDefault);
InitializeLookupTables<<<1, 51>>>(lookUpTableData);
cudaDeviceSynchronize();

cudaMemcpyToArray(lookUpTable, 0, 0, lookUpTableData, lookUpTable_size,  cudaMemcpyDeviceToDevice);
texRef.addressMode[0] = cudaAddressModeBorder;
texRef.filterMode = cudaFilterModeLinear;
texRef.normalized = false;

cudaBindTextureToArray(texRef, lookUpTable, channelDesc);

TestLookupTables<<<1, 1>>>();
cudaDeviceSynchronize();
return 0;

}

I also tried replacing the two lines

cudaExtent length = make_cudaExtent(lookUpTable_width, 0, 0);
cudaMalloc3DArray(&lookUpTable, &channelDesc, length, cudaArrayDefault);

with

cudaMallocArray(&lookUpTable, &channelDesc, lookUpTable_width, 0, cudaArrayDefault);

but get the same output.

Here is the output I get when running the program. I was expecting all values greater than 50 to print 0.0, not 100.

The value returned for value 0 is 0.000000.
The value returned for value 5 is 10.000000.
The value returned for value 10 is 20.000000.
The value returned for value 15 is 30.000000.
The value returned for value 20 is 40.000000.
The value returned for value 25 is 50.000000.
The value returned for value 30 is 60.000000.
The value returned for value 35 is 70.000000.
The value returned for value 40 is 80.000000.
The value returned for value 45 is 90.000000.
The value returned for value 50 is 100.000000.
The value returned for value 55 is 100.000000.
The value returned for value 60 is 100.000000.
The value returned for value 65 is 100.000000.
The value returned for value 70 is 100.000000.

Hello,

your experiment works as expected. With cudaAddressModeBorder the border value is returned.

I think you can not specify a value unless you do some trick, for example, adding a border of size one with the given value.

Take a look at http://docs.nvidia.com/cuda/cuda-runtime-api/index.html#group__CUDART__TYPES_1g03e1bbd2c491d809279c7a47e2cd0351 for the CUDA texture address modes

Best,
Pablo

Hi,

I must not be understanding the documentation then.

My understanding is that cudaAddressModeClamp returns the border value. And cudaAddressModeBorder should give me 0.0.

Here is what what the C Programming Guide version 5.0 says:

“The default addressing mode is to clamp the coordinates to the valid range: [0, N) for non-normalized coordinates and [0.0, 1.0) for normalized coordinates. If the border mode is specified instead, texture fetches with out-of-range texture coordinates return zero.”

Thanks,

Loren

As I had mentioned recently in a similar thread, there is a known CUDA driver bug where the Border mode is treated as ClampToEdge when non-normalized coordinates are used. I do not know which driver version was the first affected and likewise do not know which driver release will contain the fix.

You are correct that Border mode should return zero for an out-of-bounds texture access (the border value is fixed at 0 in CUDA, it is not settable).

You are right LorenS. I reviewed the documentation quickly and it was me who was confused. My apologies.

Thanks for the information. I did try to see if a similar topic has appeared, but I couldn’t get the search functions in the forum to return a reasonable list of topics to review.

Thanks,

Loren