CUDA wrong summation results

Waddaa13 · July 7, 2016, 4:26pm

please close

njuffa · July 7, 2016, 4:37pm

This is impossible to diagnose from the fragmentary code snippets shown. If you desire help with debugging, I would suggest posting small, self-contained, buildable and runnable code, that reproduces the problem, along with information on the nvcc invocation used to build the code.

Waddaa13 · July 7, 2016, 4:41pm

I am using Nsight, I would like to post the whole code but it will be very hard to follow, but what I posted is pretty much every thing could be related to the problem.
and this is my nvcc command

nvcc --cudart static -Xlinker -lgomp --relocatable-device-code=true -gencode arch=compute_50,code=compute_50 -gencode arch=compute_50,code=sm_50 -link -o

Robert_Crovella · July 7, 2016, 4:54pm

Do you have multiple threads writing to the same location in memory?

Waddaa13 · July 7, 2016, 5:04pm

please close

Waddaa13 · July 7, 2016, 5:10pm

yes, write in same array but not same element in side the array

SPWorley · July 7, 2016, 6:07pm

Not related to your question or bug, but you launch too many blocks when nElems is a multiple of the blocksize. It is harmless in your code since you check for index number in the kernel, but best to launch only as many blocks as you need.

Change

ElemsCalc<<<(nElems/128)+1,128>>>

to

ElemsCalc<<<((nElems+127)/128),128>>>

Waddaa13 · July 7, 2016, 6:28pm

Not related to your question or bug, but you launch too many blocks when nElems is a multiple of the blocksize. It is harmless in your code since you check for index number in the kernel, but best to launch only as many blocks as you need.

Change
ElemsCalc<<<(nElems/128)+1,128>>>
to
ElemsCalc<<<((nElems+127)/128),128>>>

I am afraid that yor edit will make no difference

BulatZiganshin · July 7, 2016, 7:38pm

it makes difference when nElems==128*k. pretty usual one-off error

Waddaa13 · July 7, 2016, 7:53pm

ya, you r right but my number of elements (nElems) would have a 0.0000001% chance to be a multiple of 128

Topic		Replies	Views
Code that does nothing CUDA Programming and Performance	3	5913	September 26, 2007
Parallel processing of same memory address(es) CUDA Programming and Performance cuda	3	419	August 21, 2020
please run this there is a run time error CUDA Programming and Performance	6	1811	October 12, 2009
having problem with simpe CUDA code Code debug CUDA Programming and Performance	4	1676	November 7, 2009
I want to allocate a lot of memory in CUDA. CUDA Programming and Performance	1	428	January 28, 2019
Possible nvcc bug? CUDA Programming and Performance	13	8897	January 9, 2011
Vector Reduction CUDA Programming and Performance	3	19820	March 9, 2011
Number of Blocks CUDA Programming and Performance	3	1715	October 15, 2011
Urgent help with threads please! CUDA Programming and Performance	21	10978	March 6, 2008
Summation of Big Array Parts Questions... CUDA Programming and Performance	1	6665	February 19, 2010

CUDA wrong summation results

Related topics