double for loop filling in a 2D vector

I have a set of 200 vectors each of length 361. I am allocating global memory as a linear memory of size 200 * 100 * sizeof(float).

Now I want to fill in the vectors with values 0 to 360 in each of 200 vectors. I did following:

int sampleId = threadIdx.x + blockIdx.x * blockDim.x;  //where sapleId range from 0 to 199

for(i = 0; i < 19; i++)  

{

	for(j = 0; j < 19; j++)

				{

					

									

					k = j + 19*i;

					

					

					d_SampleData[threadIdx.x * SAMPLE_LEN + k] = k;

					}

}

					

//transfer data from device to host etc ..

The result for first vector , I get , is correct as

h_SampleData[0][0] = 0

h_SampleData[0][0] = 1

.

.

h_SampleData[0][360] = 0

Problem starts from second vector and onwards as

h_SampleData[1][0] = 19

h_SampleData[0][0] = 20

h_SampleData[0][285] = 0

h_SampleData[0][360] = <som non zero value but in range [0,360]>

I am not able to understand what is wrong with the code.

I checked the values of threadIdx.x and is confirmed to be in range [0,199]

The execution configuration I use is func<<<1, NumSamples>>>(…)

where NumSamples = 200 i.e total vectors.

I expected results as :

h_SampleData[0][0] = 0

.

.

h_SampleData[0][360] =360

for second vector 

h_SampleData[1][0] = 0

.

.

h_SampleData[1][360] = 360

thrid vector as 

h_SampleData[2][0] = 0

.

.

h_SampleData[2][360] = 360

and so till vector number 199

i.e 

h_SampleData[199][0] = 0

.

.

h_SampleData[199][360] = 360

Kindly help.

Shouldn’t your global space be allocated as 200361, and not 200100?

Also, what is the value of SAMPLE_LEN?

value of SAMPLE_LEN = 19 * 19

and global memory is allocated 200 * 361* sizeof (float) … “200*100” is a typo

Thanks for your visit.

Do you have the same problem if you use a single for loop over k, from 0 to 361? That would be simpler, right?

This line:

d_SampleData[threadIdx.x * SAMPLE_LEN + k] = k;

implies that you are using only one thread block. Is that true?

Exactly.

I want to process several samples in less number of blocks.

For example If I have 4000 samples each of 361 unit length then I aim to process 4000/128 threads per block.

i.e my execution configuration comes out to be

Func<<< ceil(4000/128) , 128>>>(…)

This way I am processing 128 threads per block.

Since in the code posted above, I am using just 200 samples each of 361 I am processing all 200 samples in one block.

The value “threadIdx.x” is not unique among thread blocks. You should use this line:

d_SampleData[sampleId * SAMPLE_LEN + k] = k;

Tried it but again, it is showing correct result for first vector only that is

d_SampleData[0][0] till d_SampleData[0][360]

For second vector and onwards it shows :

	SampleCropImages[1][0]	19.000000	float  <b>This should have been started from 0 again . </b> and so on

	SampleCropImages[1][1]	20.000000	float <b>This should have been started from 1 again . </b> and so on

	SampleCropImages[1][2]	21.000000	float <b>This should have been started from 2 again . </b> and so on

	SampleCropImages[1][360]	18.000000	float <b>This should have been started from 360 again . </b> and so on

	SampleCropImages[1][285]	304.00000	float <b>This should have been started from 285 again . </b> and so on

	SampleCropImages[2][0]  	38.000000	float : <b>This should have been started from 0 again . </b> and so on

	SampleCropImages[2][1]	39.000000	float: <b>Should be 1.000</b>

	SampleCropImages[20][0]	19.000000	float  : Should be 0 

	SampleCropImages[21][0]	38.000000	float

	SampleCropImages[22][0]	57.000000	float

	SampleCropImages[23][0]	76.000000	float

	SampleCropImages[23][1]	77.000000	float

	SampleCropImages[23][10]	86.000000	float

	SampleCropImages[23][200]	276.00000	float

	SampleCropImages[23][300]	15.000000	float

	SampleCropImages[23][223]	299.00000	float

	SampleCropImages[23][224]	300.00000	float

	SampleCropImages[23][270]	346.00000	float

	SampleCropImages[23][285]	0.00000000	float

I am using 200 samples for the test above , each of 361 units.

My execution configuration is Func<<<1, 200>>>(…)

Thank you for your time. :)

Try this line:

k = i + 19*j;

No effect. Gives same result as posted earlier. :(

Solved. Was a mistake, but not in the posted code but after it , that is not posted. Thank you JeremiahPalmer for your kind help and time. :)