Is texture fetching thread-safe? Error found in texture fetching, output and code included

Hi,

in http://forums.nvidia.com/index.php?showtopic=181111 I complained about problems with convolutionTexture (SDK). The level of errors is too high.

Compiling and checking the examples in 3.2, the problem persists.

The topic did not get a lot of attention earlier.

I have more information now.

The problem is in texture (2D) fetching when a number of threads access the same texel.

It appears that there are cards where texture fetching is not thread-safe.

The deviations found are all interpolation errors (fractions).

The testprogram is a simplicification based on convolutionTexture. It makes a cudaArray of floats, the floats are given random integer values (0-16).

This array is bound to a 2D texture, passed to a kernel which does a number of fetches (as in convolution) and checks the fetched texels agains the input values.

As long as each thread fetches its own texel, everything is ok; as soon as convolution makes several threads access the same texels, series of errors develop.

(using or not using the k-loop in the kernel, see code below).

These error series do not reproduce exactly run by run, but the number of errors is approximately constant (see output). The grouping of errors is such that an erroneous texel occurs 10 times on average (most fetches are correct, of course).

Example output:

[codebox][font=“Courier New”]Initializing data…

[1048, 175]: Error, y= 70, x=1423, k=-1: 12.500 vs 12.000, element 216462

[1048, 174]: Error, y= 70, x=1422, k= 0: 12.500 vs 12.000, element 216462

[1048, 173]: Error, y= 70, x=1421, k= 1: 12.500 vs 12.000, element 216462

[1048, 172]: Error, y= 70, x=1420, k= 2: 12.500 vs 12.000, element 216462

[1048, 171]: Error, y= 70, x=1419, k= 3: 12.500 vs 12.000, element 216462

[1048, 170]: Error, y= 70, x=1418, k= 4: 12.500 vs 12.000, element 216462

[1048, 169]: Error, y= 70, x=1417, k= 5: 12.500 vs 12.000, element 216462

[1048, 168]: Error, y= 70, x=1416, k= 6: 12.500 vs 12.000, element 216462

[1048, 167]: Error, y= 70, x=1415, k= 7: 12.500 vs 12.000, element 216462

[1048, 166]: Error, y= 70, x=1414, k= 8: 12.500 vs 12.000, element 216462

[3252, 111]: Error, y= 198, x=2895, k=-1: 15.500 vs 15.000, element 611150

[3252, 110]: Error, y= 198, x=2894, k= 0: 15.500 vs 15.000, element 611150

[3252, 109]: Error, y= 198, x=2893, k= 1: 15.500 vs 15.000, element 611150

[3252, 108]: Error, y= 198, x=2892, k= 2: 15.500 vs 15.000, element 611150

[3252, 107]: Error, y= 198, x=2891, k= 3: 15.500 vs 15.000, element 611150

[3252, 106]: Error, y= 198, x=2890, k= 4: 15.500 vs 15.000, element 611150

[3252, 105]: Error, y= 198, x=2889, k= 5: 15.500 vs 15.000, element 611150

[3252, 104]: Error, y= 198, x=2888, k= 6: 15.500 vs 15.000, element 611150

[3252, 103]: Error, y= 198, x=2887, k= 7: 15.500 vs 15.000, element 611150

[3252, 102]: Error, y= 198, x=2886, k= 8: 15.500 vs 15.000, element 611150

[4160, 175]: Error, y= 262, x=2063, k=-1: 6.250 vs 6.000, element 806926

[4160, 174]: Error, y= 262, x=2062, k= 0: 6.250 vs 6.000, element 806926

[4160, 173]: Error, y= 262, x=2061, k= 1: 6.250 vs 6.000, element 806926

[4160, 172]: Error, y= 262, x=2060, k= 2: 6.250 vs 6.000, element 806926

[4160, 171]: Error, y= 262, x=2059, k= 3: 6.250 vs 6.000, element 806926

[4160, 170]: Error, y= 262, x=2058, k= 4: 6.250 vs 6.000, element 806926

[4160, 169]: Error, y= 262, x=2057, k= 5: 6.250 vs 6.000, element 806926

[4160, 168]: Error, y= 262, x=2056, k= 6: 6.250 vs 6.000, element 806926

[4160, 167]: Error, y= 262, x=2055, k= 7: 6.250 vs 6.000, element 806926

[4160, 166]: Error, y= 262, x=2054, k= 8: 6.250 vs 6.000, element 806926

[5211, 47]: Error, y= 326, x= 447, k=-1: 2.125 vs 2.000, element 1001918

[5211, 46]: Error, y= 326, x= 446, k= 0: 2.125 vs 2.000, element 1001918

[5211, 45]: Error, y= 326, x= 445, k= 1: 2.125 vs 2.000, element 1001918

[5211, 44]: Error, y= 326, x= 444, k= 2: 2.125 vs 2.000, element 1001918

[6182, 111]: Error, y= 390, x= 623, k=-1: 5.250 vs 5.000, element 1198702

[6182, 110]: Error, y= 390, x= 622, k= 0: 5.250 vs 5.000, element 1198702

[6182, 109]: Error, y= 390, x= 621, k= 1: 5.250 vs 5.000, element 1198702

[6182, 108]: Error, y= 390, x= 620, k= 2: 5.250 vs 5.000, element 1198702

[6182, 107]: Error, y= 390, x= 619, k= 3: 5.250 vs 5.000, element 1198702

[6182, 106]: Error, y= 390, x= 618, k= 4: 5.250 vs 5.000, element 1198702

[6182, 105]: Error, y= 390, x= 617, k= 5: 5.250 vs 5.000, element 1198702

[6182, 104]: Error, y= 390, x= 616, k= 6: 5.250 vs 5.000, element 1198702

[6182, 103]: Error, y= 390, x= 615, k= 7: 5.250 vs 5.000, element 1198702

[6182, 102]: Error, y= 390, x= 614, k= 8: 5.250 vs 5.000, element 1198702

[6308, 111]: Error, y= 390, x=2639, k=-1: 8.500 vs 8.000, element 1200718

[6308, 110]: Error, y= 390, x=2638, k= 0: 8.500 vs 8.000, element 1200718

[6308, 109]: Error, y= 390, x=2637, k= 1: 8.500 vs 8.000, element 1200718

[6308, 108]: Error, y= 390, x=2636, k= 2: 8.500 vs 8.000, element 1200718

[6308, 107]: Error, y= 390, x=2635, k= 3: 8.500 vs 8.000, element 1200718

[6308, 106]: Error, y= 390, x=2634, k= 4: 8.500 vs 8.000, element 1200718

[6308, 105]: Error, y= 390, x=2633, k= 5: 8.500 vs 8.000, element 1200718

[6308, 104]: Error, y= 390, x=2632, k= 6: 8.500 vs 8.000, element 1200718

[6308, 103]: Error, y= 390, x=2631, k= 7: 8.500 vs 8.000, element 1200718

[6308, 102]: Error, y= 390, x=2630, k= 8: 8.500 vs 8.000, element 1200718

[7125, 175]: Error, y= 454, x= 351, k=-1: 15.500 vs 15.000, element 1395038

[7125, 174]: Error, y= 454, x= 350, k= 0: 15.500 vs 15.000, element 1395038

[7125, 173]: Error, y= 454, x= 349, k= 1: 15.500 vs 15.000, element 1395038

[7125, 172]: Error, y= 454, x= 348, k= 2: 15.500 vs 15.000, element 1395038

[7125, 171]: Error, y= 454, x= 347, k= 3: 15.500 vs 15.000, element 1395038

[7125, 170]: Error, y= 454, x= 346, k= 4: 15.500 vs 15.000, element 1395038

[7125, 169]: Error, y= 454, x= 345, k= 5: 15.500 vs 15.000, element 1395038

[7125, 168]: Error, y= 454, x= 344, k= 6: 15.500 vs 15.000, element 1395038

[7125, 167]: Error, y= 454, x= 343, k= 7: 15.500 vs 15.000, element 1395038

[7125, 166]: Error, y= 454, x= 342, k= 8: 15.500 vs 15.000, element 1395038

[7265, 175]: Error, y= 454, x=2591, k=-1: 12.500 vs 12.000, element 1397278

[7265, 174]: Error, y= 454, x=2590, k= 0: 12.500 vs 12.000, element 1397278

[7265, 173]: Error, y= 454, x=2589, k= 1: 12.500 vs 12.000, element 1397278

[7265, 172]: Error, y= 454, x=2588, k= 2: 12.500 vs 12.000, element 1397278

[7265, 171]: Error, y= 454, x=2587, k= 3: 12.500 vs 12.000, element 1397278

[7265, 170]: Error, y= 454, x=2586, k= 4: 12.500 vs 12.000, element 1397278

[7265, 169]: Error, y= 454, x=2585, k= 5: 12.500 vs 12.000, element 1397278

[7265, 168]: Error, y= 454, x=2584, k= 6: 12.500 vs 12.000, element 1397278

[7265, 167]: Error, y= 454, x=2583, k= 7: 12.500 vs 12.000, element 1397278

[7265, 166]: Error, y= 454, x=2582, k= 8: 12.500 vs 12.000, element 1397278

[8345, 35]: Error, y= 518, x=1427, k=-5: 1.063 vs 1.000, element 1592718

[8345, 34]: Error, y= 518, x=1426, k=-4: 1.063 vs 1.000, element 1592718

[8345, 33]: Error, y= 518, x=1425, k=-3: 1.063 vs 1.000, element 1592718

[8345, 32]: Error, y= 518, x=1424, k=-2: 1.063 vs 1.000, element 1592718

[8387, 32]: Error, y= 518, x=2096, k=-2: 3.125 vs 3.000, element 1593390

[8435, 47]: Error, y= 518, x=2879, k=-1: 8.500 vs 8.000, element 1594174

[8435, 46]: Error, y= 518, x=2878, k= 0: 8.500 vs 8.000, element 1594174

[8435, 45]: Error, y= 518, x=2877, k= 1: 8.500 vs 8.000, element 1594174

[8435, 44]: Error, y= 518, x=2876, k= 2: 8.500 vs 8.000, element 1594174

[8435, 43]: Error, y= 518, x=2875, k= 3: 8.500 vs 8.000, element 1594174

[8435, 42]: Error, y= 518, x=2874, k= 4: 8.500 vs 8.000, element 1594174

[8435, 41]: Error, y= 518, x=2873, k= 5: 8.500 vs 8.000, element 1594174

[8435, 40]: Error, y= 518, x=2872, k= 6: 8.500 vs 8.000, element 1594174

[8435, 39]: Error, y= 518, x=2871, k= 7: 8.500 vs 8.000, element 1594174

[8435, 38]: Error, y= 518, x=2870, k= 8: 8.500 vs 8.000, element 1594174

[8435, 46]: Error, y=518, x=2878: 8.50 vs 8.00

[10181, 175]: Error, y= 646, x= 95, k=-1: 1.063 vs 1.000, element 1984606

[10181, 174]: Error, y= 646, x= 94, k= 0: 1.063 vs 1.000, element 1984606

[10181, 173]: Error, y= 646, x= 93, k= 1: 1.063 vs 1.000, element 1984606

[11344, 47]: Error, y= 710, x= 271, k=-1: 13.500 vs 13.000, element 2181390

[11344, 46]: Error, y= 710, x= 270, k= 0: 13.500 vs 13.000, element 2181390

[11344, 45]: Error, y= 710, x= 269, k= 1: 13.500 vs 13.000, element 2181390

[11344, 44]: Error, y= 710, x= 268, k= 2: 13.500 vs 13.000, element 2181390

[12322, 111]: Error, y= 774, x= 559, k=-1: 9.500 vs 9.000, element 2378286

[12322, 110]: Error, y= 774, x= 558, k= 0: 9.500 vs 9.000, element 2378286

[12322, 109]: Error, y= 774, x= 557, k= 1: 9.500 vs 9.000, element 2378286

[12322, 108]: Error, y= 774, x= 556, k= 2: 9.500 vs 9.000, element 2378286

[12322, 107]: Error, y= 774, x= 555, k= 3: 9.500 vs 9.000, element 2378286

[12322, 106]: Error, y= 774, x= 554, k= 4: 9.500 vs 9.000, element 2378286

[12434, 111]: Error, y= 774, x=2351, k=-1: 8.500 vs 8.000, element 2380078

[12322, 105]: Error, y= 774, x= 553, k= 5: 9.500 vs 9.000, element 2378286

[12434, 110]: Error, y= 774, x=2350, k= 0: 8.500 vs 8.000, element 2380078

[12322, 104]: Error, y= 774, x= 552, k= 6: 9.500 vs 9.000, element 2378286

[12434, 109]: Error, y= 774, x=2349, k= 1: 8.500 vs 8.000, element 2380078

[12322, 103]: Error, y= 774, x= 551, k= 7: 9.500 vs 9.000, element 2378286

[12434, 108]: Error, y= 774, x=2348, k= 2: 8.500 vs 8.000, element 2380078

[12322, 102]: Error, y= 774, x= 550, k= 8: 9.500 vs 9.000, element 2378286

[12434, 107]: Error, y= 774, x=2347, k= 3: 8.500 vs 8.000, element 2380078

[12434, 106]: Error, y= 774, x=2346, k= 4: 8.500 vs 8.000, element 2380078

[12434, 105]: Error, y= 774, x=2345, k= 5: 8.500 vs 8.000, element 2380078

[12434, 104]: Error, y= 774, x=2344, k= 6: 8.500 vs 8.000, element 2380078

[12434, 103]: Error, y= 774, x=2343, k= 7: 8.500 vs 8.000, element 2380078

[12434, 102]: Error, y= 774, x=2342, k= 8: 8.500 vs 8.000, element 2380078

[13265, 175]: Error, y= 838, x= 287, k=-1: 12.500 vs 12.000, element 2574622

[13265, 174]: Error, y= 838, x= 286, k= 0: 12.500 vs 12.000, element 2574622

[13265, 173]: Error, y= 838, x= 285, k= 1: 12.500 vs 12.000, element 2574622

[13265, 172]: Error, y= 838, x= 284, k= 2: 12.500 vs 12.000, element 2574622

[16468, 175]: Error, y=1030, x=2383, k=-1: 10.500 vs 10.000, element 3166542

[16468, 174]: Error, y=1030, x=2382, k= 0: 10.500 vs 10.000, element 3166542

[16468, 173]: Error, y=1030, x=2381, k= 1: 10.500 vs 10.000, element 3166542

[16468, 172]: Error, y=1030, x=2380, k= 2: 10.500 vs 10.000, element 3166542

[16468, 171]: Error, y=1030, x=2379, k= 3: 10.500 vs 10.000, element 3166542

[16468, 170]: Error, y=1030, x=2378, k= 4: 10.500 vs 10.000, element 3166542

[16468, 169]: Error, y=1030, x=2377, k= 5: 10.500 vs 10.000, element 3166542

[16468, 168]: Error, y=1030, x=2376, k= 6: 10.500 vs 10.000, element 3166542

[16468, 167]: Error, y=1030, x=2375, k= 7: 10.500 vs 10.000, element 3166542

[16468, 166]: Error, y=1030, x=2374, k= 8: 10.500 vs 10.000, element 3166542

[19566, 175]: Error, y=1222, x=2799, k=-1: 1.063 vs 1.000, element 3756782

[19566, 174]: Error, y=1222, x=2798, k= 0: 1.063 vs 1.000, element 3756782

[19566, 173]: Error, y=1222, x=2797, k= 1: 1.063 vs 1.000, element 3756782

[19566, 172]: Error, y=1222, x=2796, k= 2: 1.063 vs 1.000, element 3756782

[19566, 171]: Error, y=1222, x=2795, k= 3: 1.063 vs 1.000, element 3756782

[19566, 170]: Error, y=1222, x=2794, k= 4: 1.063 vs 1.000, element 3756782

[19566, 169]: Error, y=1222, x=2793, k= 5: 1.063 vs 1.000, element 3756782

[19566, 168]: Error, y=1222, x=2792, k= 6: 1.063 vs 1.000, element 3756782

[19566, 167]: Error, y=1222, x=2791, k= 7: 1.063 vs 1.000, element 3756782

[19566, 166]: Error, y=1222, x=2790, k= 8: 1.063 vs 1.000, element 3756782

[20555, 33]: Error, y=1286, x= 177, k=-3: 8.500 vs 8.000, element 3950766

[20554, 47]: Error, y=1286, x= 175, k=-1: 8.500 vs 8.000, element 3950766

[20554, 46]: Error, y=1286, x= 174, k= 0: 8.500 vs 8.000, element 3950766

[20554, 45]: Error, y=1286, x= 173, k= 1: 8.500 vs 8.000, element 3950766

[21679, 111]: Error, y=1350, x=2815, k=-1: 4.250 vs 4.000, element 4150014

[21679, 110]: Error, y=1350, x=2814, k= 0: 4.250 vs 4.000, element 4150014

[21679, 109]: Error, y=1350, x=2813, k= 1: 4.250 vs 4.000, element 4150014

[21679, 108]: Error, y=1350, x=2812, k= 2: 4.250 vs 4.000, element 4150014

[21679, 107]: Error, y=1350, x=2811, k= 3: 4.250 vs 4.000, element 4150014

[21679, 106]: Error, y=1350, x=2810, k= 4: 4.250 vs 4.000, element 4150014

[21679, 105]: Error, y=1350, x=2809, k= 5: 4.250 vs 4.000, element 4150014

[21679, 104]: Error, y=1350, x=2808, k= 6: 4.250 vs 4.000, element 4150014

[21679, 103]: Error, y=1350, x=2807, k= 7: 4.250 vs 4.000, element 4150014

[21679, 102]: Error, y=1350, x=2806, k= 8: 4.250 vs 4.000, element 4150014

[22489, 175]: Error, y=1414, x= 415, k=-1: 8.500 vs 8.000, element 4344222

[22489, 174]: Error, y=1414, x= 414, k= 0: 8.500 vs 8.000, element 4344222

[22489, 173]: Error, y=1414, x= 413, k= 1: 8.500 vs 8.000, element 4344222

[22489, 172]: Error, y=1414, x= 412, k= 2: 8.500 vs 8.000, element 4344222

[22489, 171]: Error, y=1414, x= 411, k= 3: 8.500 vs 8.000, element 4344222

[22489, 170]: Error, y=1414, x= 410, k= 4: 8.500 vs 8.000, element 4344222

[22489, 169]: Error, y=1414, x= 409, k= 5: 8.500 vs 8.000, element 4344222

[22489, 168]: Error, y=1414, x= 408, k= 6: 8.500 vs 8.000, element 4344222

[22489, 167]: Error, y=1414, x= 407, k= 7: 8.500 vs 8.000, element 4344222

[22489, 166]: Error, y=1414, x= 406, k= 8: 8.500 vs 8.000, element 4344222

[22489, 174]: Error, y=1414, x=414: 8.50 vs 8.00

Reading back GPU texture fetches…

Checking the results…

Relative L2 norm: 8.579514E-005

FAILED

Shutting down…[/font][/codebox]

Code:

[font=“Courier New”][codebox]#include <stdlib.h>

#include <stdio.h>

#include <cutil_inline.h>

#define KERNEL_RADIUS 8

constant float *d_copy_input; // pointer to device copy of the input buffer, use for debugging on gpu

#define USECUPRINTF

#ifdef USECUPRINTF

#include “cuprintf.cu”

device int errorstoreport=1000;

#endif

texture<float, 2, cudaReadModeElementType> texSrc;

//Round a / b to nearest higher integer value

inline int iDivUp(int a, int B)

{

int abyb=a/b;

return (a-abyb*b > 0) ? (abyb + 1) : (abyb);

}

global void check_tex2DfetchesKernel( float *d_Dst, int imageW, int imageH )

{

const   int ix = blockDim.x * blockIdx.x + threadIdx.x;

const   int iy = blockDim.y * blockIdx.y + threadIdx.y;

const float  x = (float)ix + 0.5f;

const float  y = (float)iy + 0.5f;

float sum=0.f;

volatile float texfetch, input=0.f;

if(ix >= imageW || iy >= imageH) return;

for(int k = -KERNEL_RADIUS; k <= KERNEL_RADIUS; k++)

{

	texfetch=tex2D(texSrc, x + (float)k, y);

   #ifdef USECUPRINTF

	int readelement=max(0,ix+k); readelement=iy*imageW+min(readelement,imageW-1);

	input=d_copy_input[readelement];

	if (texfetch!=input && errorstoreport--)

		cuPrintf("Error, y=%4d, x=%4d, k=%2d: %6.3f vs %6.3f, element %d\n",iy,ix,k,texfetch,input,readelement);

   #endif

	sum += texfetch;

}

texfetch = tex2D(texSrc, x , y);

#ifdef USECUPRINTF

if (input==0.f)

{

	input = d_copy_input[iy*imageW+ix];

	if (texfetch!=input && errorstoreport--)

		cuPrintf("Error, y=%d, x=%d: %.2f vs %.2f\n",iy,ix,texfetch,input);

}

#endif

d_Dst[iy * imageW + ix] = texfetch;			// write the texvalue back for comparing on cpu

}

extern “C” void check_tex2Dfetches( float *d_Dst, cudaArray *a_Src, int imageW, int imageH )

{

dim3 threads(16, 12);

dim3 blocks(iDivUp(imageW, threads.x), iDivUp(imageH, threads.y));

#ifdef USECUPRINTF

cudaPrintfInit();

#endif

cutilSafeCall( cudaBindTextureToArray(texSrc, a_Src) );

check_tex2DfetchesKernel<<<blocks, threads>>>( d_Dst, imageW, imageH );

cutilCheckMsg("check_tex2DfetchesKernel() execution failed\n");

#ifdef USECUPRINTF

cudaPrintfDisplay(stdout, true);

cudaPrintfEnd();

#endif

cutilSafeCall( cudaUnbindTexture(texSrc) );

}

////////////////////////////////////////////////////////////////////////////////

// Main program

////////////////////////////////////////////////////////////////////////////////

int main(int argc, char **argv)

{

float *h_Input, *h_OutputGPU;

cudaArray *a_Src;

cudaChannelFormatDesc floatTex = cudaCreateChannelDesc<float>();

float *d_Output;

const int imageW = 3072;

const int imageH = 3072 / 2;

// use command-line specified CUDA device, otherwise use device with highest Gflops/s

if ( cutCheckCmdLineFlag(argc, (const char **)argv, "device")) cutilDeviceInit(argc, argv);

else cudaSetDevice( cutGetMaxGflopsDeviceId() );

printf("Initializing data...\n");

h_Input     = (float *)malloc(imageW * imageH * sizeof(float));

h_OutputGPU = (float *)malloc(imageW * imageH * sizeof(float));

cudaMallocArray(&a_Src, &floatTex, imageW, imageH) ;

cudaMalloc((void **)&d_Output, imageW * imageH * sizeof(float)) ;

srand(2009);

for(unsigned int i = 0; i < imageW * imageH; i++)

	h_Input[i] = (float)(rand() % 16);

cudaMemcpyToArray(a_Src, 0, 0, h_Input, imageW * imageH * sizeof(float), cudaMemcpyHostToDevice) ;

float *d_t;

cudaMalloc(&d_t,imageW * imageH * sizeof(float));

cudaMemcpyToSymbol("d_copy_input",&d_t,sizeof(d_copy_input));

cudaMemcpy(d_t,h_Input,imageW * imageH * sizeof(float),cudaMemcpyHostToDevice);

check_tex2Dfetches( d_Output, a_Src, imageW, imageH );

cudaThreadSynchronize() ;

printf("Reading back GPU texture fetches...\n");

cudaMemcpy(h_OutputGPU, d_Output, imageW * imageH * sizeof(float), cudaMemcpyDeviceToHost) ;

printf("Checking the results...\n");

double delta = 0.;

double sum = 0.;

for(unsigned int i = 0; i < imageW * imageH; i++)

{

	sum += h_Input[i] * h_Input[i];

	delta += (h_OutputGPU[i] - h_Input[i]) * (h_OutputGPU[i] - h_Input[i]);

}

double L2norm = sqrt(delta / sum);

printf("Relative L2 norm: %E\n", L2norm);

printf((L2norm < 1e-6) ? "PASSED\n" : "FAILED\n");

printf("Shutting down...\n");

cudaFree(d_Output) ;

cudaFreeArray(a_Src) ;

free(h_OutputGPU);

free(h_Input);

cutilExit(argc, argv);

cudaThreadExit();

}

[/codebox][/font]

The errors occur also when the card frequencies are lowered below factory settings. But when the card has reset due to an error, there are no errors.

Code run on GTX 275, compilation with VS 2008

As stated in the earlier post, the problem does not occur on all cards, e.g. not on my old 8500gt. Lacking a card-farm, I have not reproduced the errors on another card.

I am interested in reproduction of the error, but what I want to know mostly and haven’t found, is whether texture fetches are supposed to be thread-safe.

As things stand now, I will not use textures in my calculation-programs.

Hi,

in http://forums.nvidia.com/index.php?showtopic=181111 I complained about problems with convolutionTexture (SDK). The level of errors is too high.

Compiling and checking the examples in 3.2, the problem persists.

The topic did not get a lot of attention earlier.

I have more information now.

The problem is in texture (2D) fetching when a number of threads access the same texel.

It appears that there are cards where texture fetching is not thread-safe.

The deviations found are all interpolation errors (fractions).

The testprogram is a simplicification based on convolutionTexture. It makes a cudaArray of floats, the floats are given random integer values (0-16).

This array is bound to a 2D texture, passed to a kernel which does a number of fetches (as in convolution) and checks the fetched texels agains the input values.

As long as each thread fetches its own texel, everything is ok; as soon as convolution makes several threads access the same texels, series of errors develop.

(using or not using the k-loop in the kernel, see code below).

These error series do not reproduce exactly run by run, but the number of errors is approximately constant (see output). The grouping of errors is such that an erroneous texel occurs 10 times on average (most fetches are correct, of course).

Example output:

[codebox][font=“Courier New”]Initializing data…

[1048, 175]: Error, y= 70, x=1423, k=-1: 12.500 vs 12.000, element 216462

[1048, 174]: Error, y= 70, x=1422, k= 0: 12.500 vs 12.000, element 216462

[1048, 173]: Error, y= 70, x=1421, k= 1: 12.500 vs 12.000, element 216462

[1048, 172]: Error, y= 70, x=1420, k= 2: 12.500 vs 12.000, element 216462

[1048, 171]: Error, y= 70, x=1419, k= 3: 12.500 vs 12.000, element 216462

[1048, 170]: Error, y= 70, x=1418, k= 4: 12.500 vs 12.000, element 216462

[1048, 169]: Error, y= 70, x=1417, k= 5: 12.500 vs 12.000, element 216462

[1048, 168]: Error, y= 70, x=1416, k= 6: 12.500 vs 12.000, element 216462

[1048, 167]: Error, y= 70, x=1415, k= 7: 12.500 vs 12.000, element 216462

[1048, 166]: Error, y= 70, x=1414, k= 8: 12.500 vs 12.000, element 216462

[3252, 111]: Error, y= 198, x=2895, k=-1: 15.500 vs 15.000, element 611150

[3252, 110]: Error, y= 198, x=2894, k= 0: 15.500 vs 15.000, element 611150

[3252, 109]: Error, y= 198, x=2893, k= 1: 15.500 vs 15.000, element 611150

[3252, 108]: Error, y= 198, x=2892, k= 2: 15.500 vs 15.000, element 611150

[3252, 107]: Error, y= 198, x=2891, k= 3: 15.500 vs 15.000, element 611150

[3252, 106]: Error, y= 198, x=2890, k= 4: 15.500 vs 15.000, element 611150

[3252, 105]: Error, y= 198, x=2889, k= 5: 15.500 vs 15.000, element 611150

[3252, 104]: Error, y= 198, x=2888, k= 6: 15.500 vs 15.000, element 611150

[3252, 103]: Error, y= 198, x=2887, k= 7: 15.500 vs 15.000, element 611150

[3252, 102]: Error, y= 198, x=2886, k= 8: 15.500 vs 15.000, element 611150

[4160, 175]: Error, y= 262, x=2063, k=-1: 6.250 vs 6.000, element 806926

[4160, 174]: Error, y= 262, x=2062, k= 0: 6.250 vs 6.000, element 806926

[4160, 173]: Error, y= 262, x=2061, k= 1: 6.250 vs 6.000, element 806926

[4160, 172]: Error, y= 262, x=2060, k= 2: 6.250 vs 6.000, element 806926

[4160, 171]: Error, y= 262, x=2059, k= 3: 6.250 vs 6.000, element 806926

[4160, 170]: Error, y= 262, x=2058, k= 4: 6.250 vs 6.000, element 806926

[4160, 169]: Error, y= 262, x=2057, k= 5: 6.250 vs 6.000, element 806926

[4160, 168]: Error, y= 262, x=2056, k= 6: 6.250 vs 6.000, element 806926

[4160, 167]: Error, y= 262, x=2055, k= 7: 6.250 vs 6.000, element 806926

[4160, 166]: Error, y= 262, x=2054, k= 8: 6.250 vs 6.000, element 806926

[5211, 47]: Error, y= 326, x= 447, k=-1: 2.125 vs 2.000, element 1001918

[5211, 46]: Error, y= 326, x= 446, k= 0: 2.125 vs 2.000, element 1001918

[5211, 45]: Error, y= 326, x= 445, k= 1: 2.125 vs 2.000, element 1001918

[5211, 44]: Error, y= 326, x= 444, k= 2: 2.125 vs 2.000, element 1001918

[6182, 111]: Error, y= 390, x= 623, k=-1: 5.250 vs 5.000, element 1198702

[6182, 110]: Error, y= 390, x= 622, k= 0: 5.250 vs 5.000, element 1198702

[6182, 109]: Error, y= 390, x= 621, k= 1: 5.250 vs 5.000, element 1198702

[6182, 108]: Error, y= 390, x= 620, k= 2: 5.250 vs 5.000, element 1198702

[6182, 107]: Error, y= 390, x= 619, k= 3: 5.250 vs 5.000, element 1198702

[6182, 106]: Error, y= 390, x= 618, k= 4: 5.250 vs 5.000, element 1198702

[6182, 105]: Error, y= 390, x= 617, k= 5: 5.250 vs 5.000, element 1198702

[6182, 104]: Error, y= 390, x= 616, k= 6: 5.250 vs 5.000, element 1198702

[6182, 103]: Error, y= 390, x= 615, k= 7: 5.250 vs 5.000, element 1198702

[6182, 102]: Error, y= 390, x= 614, k= 8: 5.250 vs 5.000, element 1198702

[6308, 111]: Error, y= 390, x=2639, k=-1: 8.500 vs 8.000, element 1200718

[6308, 110]: Error, y= 390, x=2638, k= 0: 8.500 vs 8.000, element 1200718

[6308, 109]: Error, y= 390, x=2637, k= 1: 8.500 vs 8.000, element 1200718

[6308, 108]: Error, y= 390, x=2636, k= 2: 8.500 vs 8.000, element 1200718

[6308, 107]: Error, y= 390, x=2635, k= 3: 8.500 vs 8.000, element 1200718

[6308, 106]: Error, y= 390, x=2634, k= 4: 8.500 vs 8.000, element 1200718

[6308, 105]: Error, y= 390, x=2633, k= 5: 8.500 vs 8.000, element 1200718

[6308, 104]: Error, y= 390, x=2632, k= 6: 8.500 vs 8.000, element 1200718

[6308, 103]: Error, y= 390, x=2631, k= 7: 8.500 vs 8.000, element 1200718

[6308, 102]: Error, y= 390, x=2630, k= 8: 8.500 vs 8.000, element 1200718

[7125, 175]: Error, y= 454, x= 351, k=-1: 15.500 vs 15.000, element 1395038

[7125, 174]: Error, y= 454, x= 350, k= 0: 15.500 vs 15.000, element 1395038

[7125, 173]: Error, y= 454, x= 349, k= 1: 15.500 vs 15.000, element 1395038

[7125, 172]: Error, y= 454, x= 348, k= 2: 15.500 vs 15.000, element 1395038

[7125, 171]: Error, y= 454, x= 347, k= 3: 15.500 vs 15.000, element 1395038

[7125, 170]: Error, y= 454, x= 346, k= 4: 15.500 vs 15.000, element 1395038

[7125, 169]: Error, y= 454, x= 345, k= 5: 15.500 vs 15.000, element 1395038

[7125, 168]: Error, y= 454, x= 344, k= 6: 15.500 vs 15.000, element 1395038

[7125, 167]: Error, y= 454, x= 343, k= 7: 15.500 vs 15.000, element 1395038

[7125, 166]: Error, y= 454, x= 342, k= 8: 15.500 vs 15.000, element 1395038

[7265, 175]: Error, y= 454, x=2591, k=-1: 12.500 vs 12.000, element 1397278

[7265, 174]: Error, y= 454, x=2590, k= 0: 12.500 vs 12.000, element 1397278

[7265, 173]: Error, y= 454, x=2589, k= 1: 12.500 vs 12.000, element 1397278

[7265, 172]: Error, y= 454, x=2588, k= 2: 12.500 vs 12.000, element 1397278

[7265, 171]: Error, y= 454, x=2587, k= 3: 12.500 vs 12.000, element 1397278

[7265, 170]: Error, y= 454, x=2586, k= 4: 12.500 vs 12.000, element 1397278

[7265, 169]: Error, y= 454, x=2585, k= 5: 12.500 vs 12.000, element 1397278

[7265, 168]: Error, y= 454, x=2584, k= 6: 12.500 vs 12.000, element 1397278

[7265, 167]: Error, y= 454, x=2583, k= 7: 12.500 vs 12.000, element 1397278

[7265, 166]: Error, y= 454, x=2582, k= 8: 12.500 vs 12.000, element 1397278

[8345, 35]: Error, y= 518, x=1427, k=-5: 1.063 vs 1.000, element 1592718

[8345, 34]: Error, y= 518, x=1426, k=-4: 1.063 vs 1.000, element 1592718

[8345, 33]: Error, y= 518, x=1425, k=-3: 1.063 vs 1.000, element 1592718

[8345, 32]: Error, y= 518, x=1424, k=-2: 1.063 vs 1.000, element 1592718

[8387, 32]: Error, y= 518, x=2096, k=-2: 3.125 vs 3.000, element 1593390

[8435, 47]: Error, y= 518, x=2879, k=-1: 8.500 vs 8.000, element 1594174

[8435, 46]: Error, y= 518, x=2878, k= 0: 8.500 vs 8.000, element 1594174

[8435, 45]: Error, y= 518, x=2877, k= 1: 8.500 vs 8.000, element 1594174

[8435, 44]: Error, y= 518, x=2876, k= 2: 8.500 vs 8.000, element 1594174

[8435, 43]: Error, y= 518, x=2875, k= 3: 8.500 vs 8.000, element 1594174

[8435, 42]: Error, y= 518, x=2874, k= 4: 8.500 vs 8.000, element 1594174

[8435, 41]: Error, y= 518, x=2873, k= 5: 8.500 vs 8.000, element 1594174

[8435, 40]: Error, y= 518, x=2872, k= 6: 8.500 vs 8.000, element 1594174

[8435, 39]: Error, y= 518, x=2871, k= 7: 8.500 vs 8.000, element 1594174

[8435, 38]: Error, y= 518, x=2870, k= 8: 8.500 vs 8.000, element 1594174

[8435, 46]: Error, y=518, x=2878: 8.50 vs 8.00

[10181, 175]: Error, y= 646, x= 95, k=-1: 1.063 vs 1.000, element 1984606

[10181, 174]: Error, y= 646, x= 94, k= 0: 1.063 vs 1.000, element 1984606

[10181, 173]: Error, y= 646, x= 93, k= 1: 1.063 vs 1.000, element 1984606

[11344, 47]: Error, y= 710, x= 271, k=-1: 13.500 vs 13.000, element 2181390

[11344, 46]: Error, y= 710, x= 270, k= 0: 13.500 vs 13.000, element 2181390

[11344, 45]: Error, y= 710, x= 269, k= 1: 13.500 vs 13.000, element 2181390

[11344, 44]: Error, y= 710, x= 268, k= 2: 13.500 vs 13.000, element 2181390

[12322, 111]: Error, y= 774, x= 559, k=-1: 9.500 vs 9.000, element 2378286

[12322, 110]: Error, y= 774, x= 558, k= 0: 9.500 vs 9.000, element 2378286

[12322, 109]: Error, y= 774, x= 557, k= 1: 9.500 vs 9.000, element 2378286

[12322, 108]: Error, y= 774, x= 556, k= 2: 9.500 vs 9.000, element 2378286

[12322, 107]: Error, y= 774, x= 555, k= 3: 9.500 vs 9.000, element 2378286

[12322, 106]: Error, y= 774, x= 554, k= 4: 9.500 vs 9.000, element 2378286

[12434, 111]: Error, y= 774, x=2351, k=-1: 8.500 vs 8.000, element 2380078

[12322, 105]: Error, y= 774, x= 553, k= 5: 9.500 vs 9.000, element 2378286

[12434, 110]: Error, y= 774, x=2350, k= 0: 8.500 vs 8.000, element 2380078

[12322, 104]: Error, y= 774, x= 552, k= 6: 9.500 vs 9.000, element 2378286

[12434, 109]: Error, y= 774, x=2349, k= 1: 8.500 vs 8.000, element 2380078

[12322, 103]: Error, y= 774, x= 551, k= 7: 9.500 vs 9.000, element 2378286

[12434, 108]: Error, y= 774, x=2348, k= 2: 8.500 vs 8.000, element 2380078

[12322, 102]: Error, y= 774, x= 550, k= 8: 9.500 vs 9.000, element 2378286

[12434, 107]: Error, y= 774, x=2347, k= 3: 8.500 vs 8.000, element 2380078

[12434, 106]: Error, y= 774, x=2346, k= 4: 8.500 vs 8.000, element 2380078

[12434, 105]: Error, y= 774, x=2345, k= 5: 8.500 vs 8.000, element 2380078

[12434, 104]: Error, y= 774, x=2344, k= 6: 8.500 vs 8.000, element 2380078

[12434, 103]: Error, y= 774, x=2343, k= 7: 8.500 vs 8.000, element 2380078

[12434, 102]: Error, y= 774, x=2342, k= 8: 8.500 vs 8.000, element 2380078

[13265, 175]: Error, y= 838, x= 287, k=-1: 12.500 vs 12.000, element 2574622

[13265, 174]: Error, y= 838, x= 286, k= 0: 12.500 vs 12.000, element 2574622

[13265, 173]: Error, y= 838, x= 285, k= 1: 12.500 vs 12.000, element 2574622

[13265, 172]: Error, y= 838, x= 284, k= 2: 12.500 vs 12.000, element 2574622

[16468, 175]: Error, y=1030, x=2383, k=-1: 10.500 vs 10.000, element 3166542

[16468, 174]: Error, y=1030, x=2382, k= 0: 10.500 vs 10.000, element 3166542

[16468, 173]: Error, y=1030, x=2381, k= 1: 10.500 vs 10.000, element 3166542

[16468, 172]: Error, y=1030, x=2380, k= 2: 10.500 vs 10.000, element 3166542

[16468, 171]: Error, y=1030, x=2379, k= 3: 10.500 vs 10.000, element 3166542

[16468, 170]: Error, y=1030, x=2378, k= 4: 10.500 vs 10.000, element 3166542

[16468, 169]: Error, y=1030, x=2377, k= 5: 10.500 vs 10.000, element 3166542

[16468, 168]: Error, y=1030, x=2376, k= 6: 10.500 vs 10.000, element 3166542

[16468, 167]: Error, y=1030, x=2375, k= 7: 10.500 vs 10.000, element 3166542

[16468, 166]: Error, y=1030, x=2374, k= 8: 10.500 vs 10.000, element 3166542

[19566, 175]: Error, y=1222, x=2799, k=-1: 1.063 vs 1.000, element 3756782

[19566, 174]: Error, y=1222, x=2798, k= 0: 1.063 vs 1.000, element 3756782

[19566, 173]: Error, y=1222, x=2797, k= 1: 1.063 vs 1.000, element 3756782

[19566, 172]: Error, y=1222, x=2796, k= 2: 1.063 vs 1.000, element 3756782

[19566, 171]: Error, y=1222, x=2795, k= 3: 1.063 vs 1.000, element 3756782

[19566, 170]: Error, y=1222, x=2794, k= 4: 1.063 vs 1.000, element 3756782

[19566, 169]: Error, y=1222, x=2793, k= 5: 1.063 vs 1.000, element 3756782

[19566, 168]: Error, y=1222, x=2792, k= 6: 1.063 vs 1.000, element 3756782

[19566, 167]: Error, y=1222, x=2791, k= 7: 1.063 vs 1.000, element 3756782

[19566, 166]: Error, y=1222, x=2790, k= 8: 1.063 vs 1.000, element 3756782

[20555, 33]: Error, y=1286, x= 177, k=-3: 8.500 vs 8.000, element 3950766

[20554, 47]: Error, y=1286, x= 175, k=-1: 8.500 vs 8.000, element 3950766

[20554, 46]: Error, y=1286, x= 174, k= 0: 8.500 vs 8.000, element 3950766

[20554, 45]: Error, y=1286, x= 173, k= 1: 8.500 vs 8.000, element 3950766

[21679, 111]: Error, y=1350, x=2815, k=-1: 4.250 vs 4.000, element 4150014

[21679, 110]: Error, y=1350, x=2814, k= 0: 4.250 vs 4.000, element 4150014

[21679, 109]: Error, y=1350, x=2813, k= 1: 4.250 vs 4.000, element 4150014

[21679, 108]: Error, y=1350, x=2812, k= 2: 4.250 vs 4.000, element 4150014

[21679, 107]: Error, y=1350, x=2811, k= 3: 4.250 vs 4.000, element 4150014

[21679, 106]: Error, y=1350, x=2810, k= 4: 4.250 vs 4.000, element 4150014

[21679, 105]: Error, y=1350, x=2809, k= 5: 4.250 vs 4.000, element 4150014

[21679, 104]: Error, y=1350, x=2808, k= 6: 4.250 vs 4.000, element 4150014

[21679, 103]: Error, y=1350, x=2807, k= 7: 4.250 vs 4.000, element 4150014

[21679, 102]: Error, y=1350, x=2806, k= 8: 4.250 vs 4.000, element 4150014

[22489, 175]: Error, y=1414, x= 415, k=-1: 8.500 vs 8.000, element 4344222

[22489, 174]: Error, y=1414, x= 414, k= 0: 8.500 vs 8.000, element 4344222

[22489, 173]: Error, y=1414, x= 413, k= 1: 8.500 vs 8.000, element 4344222

[22489, 172]: Error, y=1414, x= 412, k= 2: 8.500 vs 8.000, element 4344222

[22489, 171]: Error, y=1414, x= 411, k= 3: 8.500 vs 8.000, element 4344222

[22489, 170]: Error, y=1414, x= 410, k= 4: 8.500 vs 8.000, element 4344222

[22489, 169]: Error, y=1414, x= 409, k= 5: 8.500 vs 8.000, element 4344222

[22489, 168]: Error, y=1414, x= 408, k= 6: 8.500 vs 8.000, element 4344222

[22489, 167]: Error, y=1414, x= 407, k= 7: 8.500 vs 8.000, element 4344222

[22489, 166]: Error, y=1414, x= 406, k= 8: 8.500 vs 8.000, element 4344222

[22489, 174]: Error, y=1414, x=414: 8.50 vs 8.00

Reading back GPU texture fetches…

Checking the results…

Relative L2 norm: 8.579514E-005

FAILED

Shutting down…[/font][/codebox]

Code:

[font=“Courier New”][codebox]#include <stdlib.h>

#include <stdio.h>

#include <cutil_inline.h>

#define KERNEL_RADIUS 8

constant float *d_copy_input; // pointer to device copy of the input buffer, use for debugging on gpu

#define USECUPRINTF

#ifdef USECUPRINTF

#include “cuprintf.cu”

device int errorstoreport=1000;

#endif

texture<float, 2, cudaReadModeElementType> texSrc;

//Round a / b to nearest higher integer value

inline int iDivUp(int a, int B)

{

int abyb=a/b;

return (a-abyb*b > 0) ? (abyb + 1) : (abyb);

}

global void check_tex2DfetchesKernel( float *d_Dst, int imageW, int imageH )

{

const   int ix = blockDim.x * blockIdx.x + threadIdx.x;

const   int iy = blockDim.y * blockIdx.y + threadIdx.y;

const float  x = (float)ix + 0.5f;

const float  y = (float)iy + 0.5f;

float sum=0.f;

volatile float texfetch, input=0.f;

if(ix >= imageW || iy >= imageH) return;

for(int k = -KERNEL_RADIUS; k <= KERNEL_RADIUS; k++)

{

	texfetch=tex2D(texSrc, x + (float)k, y);

   #ifdef USECUPRINTF

	int readelement=max(0,ix+k); readelement=iy*imageW+min(readelement,imageW-1);

	input=d_copy_input[readelement];

	if (texfetch!=input && errorstoreport--)

		cuPrintf("Error, y=%4d, x=%4d, k=%2d: %6.3f vs %6.3f, element %d\n",iy,ix,k,texfetch,input,readelement);

   #endif

	sum += texfetch;

}

texfetch = tex2D(texSrc, x , y);

#ifdef USECUPRINTF

if (input==0.f)

{

	input = d_copy_input[iy*imageW+ix];

	if (texfetch!=input && errorstoreport--)

		cuPrintf("Error, y=%d, x=%d: %.2f vs %.2f\n",iy,ix,texfetch,input);

}

#endif

d_Dst[iy * imageW + ix] = texfetch;			// write the texvalue back for comparing on cpu

}

extern “C” void check_tex2Dfetches( float *d_Dst, cudaArray *a_Src, int imageW, int imageH )

{

dim3 threads(16, 12);

dim3 blocks(iDivUp(imageW, threads.x), iDivUp(imageH, threads.y));

#ifdef USECUPRINTF

cudaPrintfInit();

#endif

cutilSafeCall( cudaBindTextureToArray(texSrc, a_Src) );

check_tex2DfetchesKernel<<<blocks, threads>>>( d_Dst, imageW, imageH );

cutilCheckMsg("check_tex2DfetchesKernel() execution failed\n");

#ifdef USECUPRINTF

cudaPrintfDisplay(stdout, true);

cudaPrintfEnd();

#endif

cutilSafeCall( cudaUnbindTexture(texSrc) );

}

////////////////////////////////////////////////////////////////////////////////

// Main program

////////////////////////////////////////////////////////////////////////////////

int main(int argc, char **argv)

{

float *h_Input, *h_OutputGPU;

cudaArray *a_Src;

cudaChannelFormatDesc floatTex = cudaCreateChannelDesc<float>();

float *d_Output;

const int imageW = 3072;

const int imageH = 3072 / 2;

// use command-line specified CUDA device, otherwise use device with highest Gflops/s

if ( cutCheckCmdLineFlag(argc, (const char **)argv, "device")) cutilDeviceInit(argc, argv);

else cudaSetDevice( cutGetMaxGflopsDeviceId() );

printf("Initializing data...\n");

h_Input     = (float *)malloc(imageW * imageH * sizeof(float));

h_OutputGPU = (float *)malloc(imageW * imageH * sizeof(float));

cudaMallocArray(&a_Src, &floatTex, imageW, imageH) ;

cudaMalloc((void **)&d_Output, imageW * imageH * sizeof(float)) ;

srand(2009);

for(unsigned int i = 0; i < imageW * imageH; i++)

	h_Input[i] = (float)(rand() % 16);

cudaMemcpyToArray(a_Src, 0, 0, h_Input, imageW * imageH * sizeof(float), cudaMemcpyHostToDevice) ;

float *d_t;

cudaMalloc(&d_t,imageW * imageH * sizeof(float));

cudaMemcpyToSymbol("d_copy_input",&d_t,sizeof(d_copy_input));

cudaMemcpy(d_t,h_Input,imageW * imageH * sizeof(float),cudaMemcpyHostToDevice);

check_tex2Dfetches( d_Output, a_Src, imageW, imageH );

cudaThreadSynchronize() ;

printf("Reading back GPU texture fetches...\n");

cudaMemcpy(h_OutputGPU, d_Output, imageW * imageH * sizeof(float), cudaMemcpyDeviceToHost) ;

printf("Checking the results...\n");

double delta = 0.;

double sum = 0.;

for(unsigned int i = 0; i < imageW * imageH; i++)

{

	sum += h_Input[i] * h_Input[i];

	delta += (h_OutputGPU[i] - h_Input[i]) * (h_OutputGPU[i] - h_Input[i]);

}

double L2norm = sqrt(delta / sum);

printf("Relative L2 norm: %E\n", L2norm);

printf((L2norm < 1e-6) ? "PASSED\n" : "FAILED\n");

printf("Shutting down...\n");

cudaFree(d_Output) ;

cudaFreeArray(a_Src) ;

free(h_OutputGPU);

free(h_Input);

cutilExit(argc, argv);

cudaThreadExit();

}

[/codebox][/font]

The errors occur also when the card frequencies are lowered below factory settings. But when the card has reset due to an error, there are no errors.

Code run on GTX 275, compilation with VS 2008

As stated in the earlier post, the problem does not occur on all cards, e.g. not on my old 8500gt. Lacking a card-farm, I have not reproduced the errors on another card.

I am interested in reproduction of the error, but what I want to know mostly and haven’t found, is whether texture fetches are supposed to be thread-safe.

As things stand now, I will not use textures in my calculation-programs.

Anyone?

This appears to be a serious enough issue, seeing that texture fetches make interpolation errors which makes them useless in array-retrieving.

I would feel some real gratitude if someone took the trouble to reproduce this.

Anyone?

This appears to be a serious enough issue, seeing that texture fetches make interpolation errors which makes them useless in array-retrieving.

I would feel some real gratitude if someone took the trouble to reproduce this.

If you’re a registered developer, it might be faster to file a bug via the registered developer portal (that way, it goes directly to the CUDA developers).

If you’re a registered developer, it might be faster to file a bug via the registered developer portal (that way, it goes directly to the CUDA developers).