Bugged code in website

Marco_Ribero · October 2, 2015, 6:18pm

Hi all,
I’ve found a bug in the code published on http://http.developer.nvidia.com/GPUGems3/gpugems3_ch39.html.

The example Example 39-1(Naive Scan Algorithm), at row 16 should be

temp[pout*n+thid] = temp[pin*n+thid]+temp[pin*n+thid - offset];

, because otherwise it would get the partial sum of two iterations before…same on the relative pdf (Parallel Prefix Sum
(Scan) with CUDA).

In addition,I think that at row 14 it would be better to write

pin = 1 - pin;

It’s more clear,in addition is avoided an execution dependency with previous line.

LongY · October 4, 2015, 11:51pm

According to the pseudo code of this double buffered scan:
1: for d = 1 to log2 n do
2: for all k in parallel do
3: if k >= 2^d then
4: x[out][k] = x[in][k – 2 d-1] + x[in][k]
5: else
6: x[out][k] = x[in][k]

Based on line 4, the bug indeed existed as your described above.

Marco_Ribero · October 5, 2015, 9:36am

Thanks for your positive reply…Can someone tell me how to report this bug?in order to avoid headaches to guys who want to implement prefix sum

Robert_Crovella · October 5, 2015, 12:59pm

That document (a published chapter as part of a published book) is unlikely to be changed.

Anyone who wants to implement a prefix sum can do so based on the material in that chapter but should not do so blindly without testing it (which is true for any other provided code, as well.)

If you’re interested in a fast high performance prefix sum, you’re advised to use thrust or cub anyway, not the material there, which is incomplete and provided for learning purposes. (For example, it only works within a threadblock, as written, not device-wide).

If you want to file a bug with nvidia, register as a developer at developer.nvidia.com and file the bug using the portal there.

Marco_Ribero · October 6, 2015, 8:46pm

Thanks for your reply, I’ve sent the bug.

I’ve used that code as a base in order to execute a recurrence equation

LongY · October 7, 2015, 4:37pm

FYI, the author Harris who wrote this article mentioned in a post:
[url]cuda - CONFLICT_FREE_OFFSET macro used in the parallel prefix algorithm from GPU Gems 3 - Stack Overflow
quote “I wrote that code and co-wrote the article, and I request that you use the article only for learning about scan algorithms, and do not use the code in it. It was written when CUDA was new, and I was new to CUDA.”
You can find it in the post above.

Marco_Ribero · October 8, 2015, 6:29pm

Thank you, I’ve found an interesting link inside stackoverflow…

My interest in prefix sum come from the need to implement a recurrence equation y[n]=a_0x[n]+a_1x[n-1]…+b_1y[n-1]+b_2y[n-2]…now I’ve found a more efficient approach than prefix sum over matricies: partially unroll the equation(separating range of X and Y) and perform two FFT

Topic		Replies	Views
Naive prefix sum algorithm from GPU gems not working CUDA Programming and Performance	1	860	February 2, 2020
Broken link - GPUGems3 Prefix sum CUDA Programming and Performance	5	1486	December 31, 2020
Parallel Prefix Sum (Scan) with CUDA Latest version of document? CUDA Programming and Performance	2	14039	July 30, 2011
CUDA - calculation of a sum CUDA Programming and Performance	7	5529	April 30, 2010
parallel scan CUDA Programming and Performance	8	3433	August 11, 2009
ML + Time series CUDA Programming and Performance	8	1202	November 10, 2015
Someone can help me with the Scan application? CUDA Programming and Performance	0	1879	August 25, 2008
Compute Cumulative Frequency CUDA Programming and Performance	5	5047	July 13, 2009
high performance prefix sum / scan function in CUDA, looking for thrust, cuDPP library alterative CUDA Programming and Performance	3	2987	September 2, 2013
Parallel sum, -arch sm_21, NVRAM xid CUDA Programming and Performance	0	8866	February 16, 2011

Bugged code in website

Related topics