 # double for loop filling in a 2D vector

I have a set of 200 vectors each of length 361. I am allocating global memory as a linear memory of size 200 * 100 * sizeof(float).

Now I want to fill in the vectors with values 0 to 360 in each of 200 vectors. I did following:

``````int sampleId = threadIdx.x + blockIdx.x * blockDim.x;  //where sapleId range from 0 to 199

for(i = 0; i < 19; i++)

{

for(j = 0; j < 19; j++)

{

k = j + 19*i;

d_SampleData[threadIdx.x * SAMPLE_LEN + k] = k;

}

}

//transfer data from device to host etc ..
``````

The result for first vector , I get , is correct as

h_SampleData = 0

h_SampleData = 1

.

.

h_SampleData = 0

Problem starts from second vector and onwards as

h_SampleData = 19

h_SampleData = 20

h_SampleData = 0

h_SampleData = <som non zero value but in range [0,360]>

I am not able to understand what is wrong with the code.

I checked the values of threadIdx.x and is confirmed to be in range [0,199]

The execution configuration I use is func<<<1, NumSamples>>>(…)

where NumSamples = 200 i.e total vectors.

I expected results as :

``````h_SampleData = 0

.

.

h_SampleData =360

for second vector

h_SampleData = 0

.

.

h_SampleData = 360

thrid vector as

h_SampleData = 0

.

.

h_SampleData = 360

and so till vector number 199

i.e

h_SampleData = 0

.

.

h_SampleData = 360
``````

Kindly help.

Shouldn’t your global space be allocated as 200361, and not 200100?

Also, what is the value of SAMPLE_LEN?

value of SAMPLE_LEN = 19 * 19

and global memory is allocated 200 * 361* sizeof (float) … “200*100” is a typo

Do you have the same problem if you use a single for loop over k, from 0 to 361? That would be simpler, right?

This line:

d_SampleData[threadIdx.x * SAMPLE_LEN + k] = k;

implies that you are using only one thread block. Is that true?

Exactly.

I want to process several samples in less number of blocks.

For example If I have 4000 samples each of 361 unit length then I aim to process 4000/128 threads per block.

i.e my execution configuration comes out to be

Func<<< ceil(4000/128) , 128>>>(…)

This way I am processing 128 threads per block.

Since in the code posted above, I am using just 200 samples each of 361 I am processing all 200 samples in one block.

The value “threadIdx.x” is not unique among thread blocks. You should use this line:

d_SampleData[sampleId * SAMPLE_LEN + k] = k;

Tried it but again, it is showing correct result for first vector only that is

d_SampleData till d_SampleData

For second vector and onwards it shows :

``````	SampleCropImages	19.000000	float  <b>This should have been started from 0 again . </b> and so on

SampleCropImages	20.000000	float <b>This should have been started from 1 again . </b> and so on

SampleCropImages	21.000000	float <b>This should have been started from 2 again . </b> and so on

SampleCropImages	18.000000	float <b>This should have been started from 360 again . </b> and so on

SampleCropImages	304.00000	float <b>This should have been started from 285 again . </b> and so on

SampleCropImages  	38.000000	float : <b>This should have been started from 0 again . </b> and so on

SampleCropImages	39.000000	float: <b>Should be 1.000</b>

SampleCropImages	19.000000	float  : Should be 0

SampleCropImages	38.000000	float

SampleCropImages	57.000000	float

SampleCropImages	76.000000	float

SampleCropImages	77.000000	float

SampleCropImages	86.000000	float

SampleCropImages	276.00000	float

SampleCropImages	15.000000	float

SampleCropImages	299.00000	float

SampleCropImages	300.00000	float

SampleCropImages	346.00000	float

SampleCropImages	0.00000000	float
``````

I am using 200 samples for the test above , each of 361 units.

My execution configuration is Func<<<1, 200>>>(…)

Thank you for your time. :)

Try this line:

k = i + 19*j;

No effect. Gives same result as posted earlier. :(

Solved. Was a mistake, but not in the posted code but after it , that is not posted. Thank you JeremiahPalmer for your kind help and time. :)