Inaccuracy with Nested (2) For Loops Is there an issue with nested for loops

skp · February 4, 2008, 7:03am

Hey folks,

            I was working on something which requires to load multiple batches of data to the shared memory and process further. I would reduce my problem to a snippet of code and would be very

thankful if someone could share his/her thoughts

Given the following code runs for say just 1 Block

When I read back the variable “tracknum” I certainly get 5000 from both the first and second index of tracknum

On the contrary if I use the following code

Assume MAX_DATA to be something which can be stored in Shared Memory

The output for tracknum[0] stays 5000

and tracknum[1] reports 2000

I am using CUDA 1.1 just so if it matters.

Thank you for any information you could share

skp

yk_cadcg · February 5, 2008, 6:46am

hi,

i don’t have cuda 1.1 card and am not familiar with atomic funcs.

is it to do with the __syncthreads(); ? You repeat atomicAdd for different times among the threads. i’m eager to know how the hardware ensure this atomicAdd and the subsequent _syncthreads().

wumpus · February 5, 2008, 8:29am

Please be more clear on what values you expect, because we cannot say anything about the output if we don’t know the input. I suppose numData*numpass is 2000?

skp · February 5, 2008, 8:45am

I think if you look at the code you will realize that the output should be 5000 in the second case too. numData takes the value of MAX_DATA untill the data left is less that MAX_DATA in that case whatever is left is the value of numData.

thus for a case of 5000

numData would take the value 256 for 19 times (consider MAX_DATA = 256)

and the value 136 for the last pass.

thus I should get 256 * 19 + 136 = 5000 as the output even for trackNum[1]

Let me know if something is not clear

Thanks for reply

skp · February 5, 2008, 12:14pm

I am sorry for a typo in the above posting, an instance of tricount should be read as dataCount. I am sorry about that

Topic		Replies	Views
Odd problem with CUDA nested loop seems to not work CUDA Programming and Performance	3	11639	January 20, 2009
Why cannot run this program CUDA Programming and Performance	7	961	May 11, 2018
'for' loop performance hacks? CUDA Programming and Performance	17	10616	February 28, 2009
Looking for kernel performance suggestions CUDA Programming and Performance	12	60	August 23, 2024
do not understand thread/block division CUDA Programming and Performance	10	2799	April 23, 2012
[SOLVED] Code his own shared memory with device memory! CUDA Programming and Performance	15	2587	October 7, 2015
Concurrent shared memory read/write access CUDA Programming and Performance	6	9829	July 7, 2011
One question regarding shared memory CUDA Programming and Performance	5	1250	April 24, 2013
Annoying problems with memory and/or syntax CUDA Programming and Performance	19	4775	April 8, 2008
Really simple while loop issues CUDA Programming and Performance	4	3202	October 27, 2014

Inaccuracy with Nested (2) For Loops Is there an issue with nested for loops

Related topics