Can't get rid of convolution texture errors on GTX 275

Hi,

Once in a while I return to convolution using texture. It’s the SDK example and I keep getting errors, although there shouldn’t be any.

On a device with few processors (8500GT or other) there are no errors.

The same compilation (SM_11) on a 200 device with many processors gives errors.

These errors are confined to the row convolution. The column convolution is flawless.

Surprising is also that the GPU consistently gives highter results than the CPU.

The adapted SDK code is enclosed. The output varies between executions.

Typical output on my 275:[codebox]Using CUDA device [0]: GeForce GTX 275

Initializing data…

Running GPU rows convolution (1 identical ITERATIONS)…

Average convolutionRowsGPU() time: 1.620440 msecs; //2911.920208 Mpix/s

Checking the row convolution results…

…running convolutionRowsCPU()

Row Convolution: output GPU - output CPU: 3.50 at index 1591654

Row Convolution: output GPU - output CPU: 1.50 at index 1591655

Row Convolution: output GPU - output CPU: 3.00 at index 1591656

Row Convolution: output GPU - output CPU: 1.00 at index 1591657

Row Convolution: output GPU - output CPU: 1.50 at index 1591658

Row Convolution: output GPU - output CPU: 2.50 at index 1591659

Row Convolution: output GPU - output CPU: 7.50 at index 1591660

Row Convolution: output GPU - output CPU: 7.00 at index 1591661

Row Convolution: output GPU - output CPU: 5.50 at index 1591662

Row Convolution: output GPU - output CPU: 4.00 at index 1591663

Row Convolution: output GPU - output CPU: 3.50 at index 2378230

Row Convolution: output GPU - output CPU: 1.50 at index 2378231

Row Convolution: output GPU - output CPU: 3.00 at index 2378232

Row Convolution: output GPU - output CPU: 1.00 at index 2378233

Row Convolution: output GPU - output CPU: 1.50 at index 2378234

Row Convolution: output GPU - output CPU: 2.50 at index 2378235

Row Convolution: output GPU - output CPU: 7.50 at index 2378236

Row Convolution: output GPU - output CPU: 7.00 at index 2378237

Row Convolution: output GPU - output CPU: 5.50 at index 2378238

Row Convolution: output GPU - output CPU: 4.00 at index 2378239

Row Convolution: output GPU - output CPU: 3.50 at index 2967574

Row Convolution: output GPU - output CPU: 1.50 at index 2967575

Row Convolution: output GPU - output CPU: 3.00 at index 2967576

Row Convolution: output GPU - output CPU: 1.00 at index 2967577

Row Convolution: output GPU - output CPU: 1.50 at index 2967578

Row Convolution: output GPU - output CPU: 2.50 at index 2967579

Row Convolution: output GPU - output CPU: 7.50 at index 2967580

Row Convolution: output GPU - output CPU: 7.00 at index 2967581

Row Convolution: output GPU - output CPU: 5.50 at index 2967582

Row Convolution: output GPU - output CPU: 4.00 at index 2967583

Row Convolution: output GPU - output CPU: 1.75 at index 3361702

Row Convolution: output GPU - output CPU: 0.75 at index 3361703

Row Convolution: output GPU - output CPU: 1.50 at index 3361704

Row Convolution: output GPU - output CPU: 0.50 at index 3361705

Row Convolution: output GPU - output CPU: 0.75 at index 3361706

Row Convolution: output GPU - output CPU: 1.25 at index 3361707

Row Convolution: output GPU - output CPU: 3.75 at index 3361708

Row Convolution: output GPU - output CPU: 3.50 at index 3361709

Row Convolution: output GPU - output CPU: 2.75 at index 3361710

Row Convolution: output GPU - output CPU: 2.00 at index 3361711

Row Convolution: output GPU - output CPU: 3.50 at index 3950598

Row Convolution: output GPU - output CPU: 1.50 at index 3950599

Row Convolution: output GPU - output CPU: 3.00 at index 3950600

Row Convolution: output GPU - output CPU: 1.00 at index 3950601

Row Convolution: output GPU - output CPU: 1.50 at index 3950602

Row Convolution: output GPU - output CPU: 2.50 at index 3950603

Row Convolution: output GPU - output CPU: 7.50 at index 3950604

Row Convolution: output GPU - output CPU: 7.00 at index 3950605

Row Convolution: output GPU - output CPU: 5.50 at index 3950606

Row Convolution: output GPU - output CPU: 4.00 at index 3950607

Row Convolution: output GPU - output CPU: 3.50 at index 4343830

Row Convolution: output GPU - output CPU: 1.50 at index 4343831

Row Convolution: output GPU - output CPU: 3.00 at index 4343832

Row Convolution: output GPU - output CPU: 1.00 at index 4343833

Row Convolution: output GPU - output CPU: 1.50 at index 4343834

Row Convolution: output GPU - output CPU: 2.50 at index 4343835

Row Convolution: output GPU - output CPU: 7.50 at index 4343836

Row Convolution: output GPU - output CPU: 7.00 at index 4343837

Row Convolution: output GPU - output CPU: 5.50 at index 4343838

Row Convolution: output GPU - output CPU: 4.00 at index 4343839

Row Convolution: output GPU - output CPU: 1.75 at index 4540502

Row Convolution: output GPU - output CPU: 0.75 at index 4540503

Row Convolution: output GPU - output CPU: 1.50 at index 4540504

Row Convolution: output GPU - output CPU: 0.50 at index 4540505

Row Convolution: output GPU - output CPU: 0.75 at index 4540506

Row Convolution: output GPU - output CPU: 1.25 at index 4540507

Row Convolution: output GPU - output CPU: 3.75 at index 4540508

Row Convolution: output GPU - output CPU: 3.50 at index 4540509

Row Convolution: output GPU - output CPU: 2.75 at index 4540510

Row Convolution: output GPU - output CPU: 2.00 at index 4540511

Relative L2 norm: 1.539866E-005

FAILED

Copying convolutionRowCPU() output back to the texture…

cudaMemcpyToArray() time: 16.86 msecs; //279.839840 Mpix/s

Running GPU columns convolution (1 ITERATIONS)

Average convolutionColumnsGPU() time: 1.688360 msecs; //2794.778405 Mpix/s

Reading back GPU results…

…running convolutionColumnsCPU()

Relative L2 norm: 0.000000E+000

PASSED

Shutting down…[/codebox]

You will notice that there are short runs of errors, no apparent patterns.

I suspect that the blocks interfere somehow, but I have no idea how this would happen.

Ultimately, I would not use texture code for convolution, but it bothers me that I can’t find a solution.

Can anyone reproduce this problem?