Hi,
Once in a while I return to convolution using texture. It’s the SDK example and I keep getting errors, although there shouldn’t be any.
On a device with few processors (8500GT or other) there are no errors.
The same compilation (SM_11) on a 200 device with many processors gives errors.
These errors are confined to the row convolution. The column convolution is flawless.
Surprising is also that the GPU consistently gives highter results than the CPU.
The adapted SDK code is enclosed. The output varies between executions.
Typical output on my 275:[codebox]Using CUDA device [0]: GeForce GTX 275
Initializing data…
Running GPU rows convolution (1 identical ITERATIONS)…
Average convolutionRowsGPU() time: 1.620440 msecs; //2911.920208 Mpix/s
Checking the row convolution results…
…running convolutionRowsCPU()
Row Convolution: output GPU - output CPU: 3.50 at index 1591654
Row Convolution: output GPU - output CPU: 1.50 at index 1591655
Row Convolution: output GPU - output CPU: 3.00 at index 1591656
Row Convolution: output GPU - output CPU: 1.00 at index 1591657
Row Convolution: output GPU - output CPU: 1.50 at index 1591658
Row Convolution: output GPU - output CPU: 2.50 at index 1591659
Row Convolution: output GPU - output CPU: 7.50 at index 1591660
Row Convolution: output GPU - output CPU: 7.00 at index 1591661
Row Convolution: output GPU - output CPU: 5.50 at index 1591662
Row Convolution: output GPU - output CPU: 4.00 at index 1591663
Row Convolution: output GPU - output CPU: 3.50 at index 2378230
Row Convolution: output GPU - output CPU: 1.50 at index 2378231
Row Convolution: output GPU - output CPU: 3.00 at index 2378232
Row Convolution: output GPU - output CPU: 1.00 at index 2378233
Row Convolution: output GPU - output CPU: 1.50 at index 2378234
Row Convolution: output GPU - output CPU: 2.50 at index 2378235
Row Convolution: output GPU - output CPU: 7.50 at index 2378236
Row Convolution: output GPU - output CPU: 7.00 at index 2378237
Row Convolution: output GPU - output CPU: 5.50 at index 2378238
Row Convolution: output GPU - output CPU: 4.00 at index 2378239
Row Convolution: output GPU - output CPU: 3.50 at index 2967574
Row Convolution: output GPU - output CPU: 1.50 at index 2967575
Row Convolution: output GPU - output CPU: 3.00 at index 2967576
Row Convolution: output GPU - output CPU: 1.00 at index 2967577
Row Convolution: output GPU - output CPU: 1.50 at index 2967578
Row Convolution: output GPU - output CPU: 2.50 at index 2967579
Row Convolution: output GPU - output CPU: 7.50 at index 2967580
Row Convolution: output GPU - output CPU: 7.00 at index 2967581
Row Convolution: output GPU - output CPU: 5.50 at index 2967582
Row Convolution: output GPU - output CPU: 4.00 at index 2967583
Row Convolution: output GPU - output CPU: 1.75 at index 3361702
Row Convolution: output GPU - output CPU: 0.75 at index 3361703
Row Convolution: output GPU - output CPU: 1.50 at index 3361704
Row Convolution: output GPU - output CPU: 0.50 at index 3361705
Row Convolution: output GPU - output CPU: 0.75 at index 3361706
Row Convolution: output GPU - output CPU: 1.25 at index 3361707
Row Convolution: output GPU - output CPU: 3.75 at index 3361708
Row Convolution: output GPU - output CPU: 3.50 at index 3361709
Row Convolution: output GPU - output CPU: 2.75 at index 3361710
Row Convolution: output GPU - output CPU: 2.00 at index 3361711
Row Convolution: output GPU - output CPU: 3.50 at index 3950598
Row Convolution: output GPU - output CPU: 1.50 at index 3950599
Row Convolution: output GPU - output CPU: 3.00 at index 3950600
Row Convolution: output GPU - output CPU: 1.00 at index 3950601
Row Convolution: output GPU - output CPU: 1.50 at index 3950602
Row Convolution: output GPU - output CPU: 2.50 at index 3950603
Row Convolution: output GPU - output CPU: 7.50 at index 3950604
Row Convolution: output GPU - output CPU: 7.00 at index 3950605
Row Convolution: output GPU - output CPU: 5.50 at index 3950606
Row Convolution: output GPU - output CPU: 4.00 at index 3950607
Row Convolution: output GPU - output CPU: 3.50 at index 4343830
Row Convolution: output GPU - output CPU: 1.50 at index 4343831
Row Convolution: output GPU - output CPU: 3.00 at index 4343832
Row Convolution: output GPU - output CPU: 1.00 at index 4343833
Row Convolution: output GPU - output CPU: 1.50 at index 4343834
Row Convolution: output GPU - output CPU: 2.50 at index 4343835
Row Convolution: output GPU - output CPU: 7.50 at index 4343836
Row Convolution: output GPU - output CPU: 7.00 at index 4343837
Row Convolution: output GPU - output CPU: 5.50 at index 4343838
Row Convolution: output GPU - output CPU: 4.00 at index 4343839
Row Convolution: output GPU - output CPU: 1.75 at index 4540502
Row Convolution: output GPU - output CPU: 0.75 at index 4540503
Row Convolution: output GPU - output CPU: 1.50 at index 4540504
Row Convolution: output GPU - output CPU: 0.50 at index 4540505
Row Convolution: output GPU - output CPU: 0.75 at index 4540506
Row Convolution: output GPU - output CPU: 1.25 at index 4540507
Row Convolution: output GPU - output CPU: 3.75 at index 4540508
Row Convolution: output GPU - output CPU: 3.50 at index 4540509
Row Convolution: output GPU - output CPU: 2.75 at index 4540510
Row Convolution: output GPU - output CPU: 2.00 at index 4540511
Relative L2 norm: 1.539866E-005
FAILED
Copying convolutionRowCPU() output back to the texture…
cudaMemcpyToArray() time: 16.86 msecs; //279.839840 Mpix/s
Running GPU columns convolution (1 ITERATIONS)
Average convolutionColumnsGPU() time: 1.688360 msecs; //2794.778405 Mpix/s
Reading back GPU results…
…running convolutionColumnsCPU()
Relative L2 norm: 0.000000E+000
PASSED
Shutting down…[/codebox]
You will notice that there are short runs of errors, no apparent patterns.
I suspect that the blocks interfere somehow, but I have no idea how this would happen.
Ultimately, I would not use texture code for convolution, but it bothers me that I can’t find a solution.
Can anyone reproduce this problem?