cirus
September 18, 2009, 4:04pm
1
I have a set of 200 vectors each of length 361. I am allocating global memory as a linear memory of size 200 * 100 * sizeof(float).
Now I want to fill in the vectors with values 0 to 360 in each of 200 vectors. I did following:
int sampleId = threadIdx.x + blockIdx.x * blockDim.x; //where sapleId range from 0 to 199
for(i = 0; i < 19; i++)
{
for(j = 0; j < 19; j++)
{
k = j + 19*i;
d_SampleData[threadIdx.x * SAMPLE_LEN + k] = k;
}
}
//transfer data from device to host etc ..
The result for first vector , I get , is correct as
h_SampleData[0][0] = 0
h_SampleData[0][0] = 1
.
.
h_SampleData[0][360] = 0
Problem starts from second vector and onwards as
h_SampleData[1][0] = 19
h_SampleData[0][0] = 20
…
h_SampleData[0][285] = 0
h_SampleData[0][360] = <som non zero value but in range [0,360]>
I am not able to understand what is wrong with the code.
I checked the values of threadIdx.x and is confirmed to be in range [0,199]
The execution configuration I use is func<<<1, NumSamples>>>(…)
where NumSamples = 200 i.e total vectors.
I expected results as :
h_SampleData[0][0] = 0
.
.
h_SampleData[0][360] =360
for second vector
h_SampleData[1][0] = 0
.
.
h_SampleData[1][360] = 360
thrid vector as
h_SampleData[2][0] = 0
.
.
h_SampleData[2][360] = 360
and so till vector number 199
i.e
h_SampleData[199][0] = 0
.
.
h_SampleData[199][360] = 360
Kindly help.
I have a set of 200 vectors each of length 361. I am allocating global memory as a linear memory of size 200 * 100 * sizeof(float).
Now I want to fill in the vectors with values 0 to 360 in each of 200 vectors. I did following:
int sampleId = threadIdx.x + blockIdx.x * blockDim.x; //where sapleId range from 0 to 199
for(i = 0; i < 19; i++)
{
for(j = 0; j < 19; j++)
{
k = j + 19*i;
d_SampleData[threadIdx.x * SAMPLE_LEN + k] = k;
}
}
//transfer data from device to host etc ..
The result for first vector , I get , is correct as
h_SampleData[0][0] = 0
h_SampleData[0][0] = 1
.
.
h_SampleData[0][360] = 0
Problem starts from second vector and onwards as
h_SampleData[1][0] = 19
h_SampleData[0][0] = 20
…
h_SampleData[0][285] = 0
h_SampleData[0][360] = <som non zero value but in range [0,360]>
I am not able to understand what is wrong with the code.
I checked the values of threadIdx.x and is confirmed to be in range [0,199]
The execution configuration I use is func<<<1, NumSamples>>>(…)
where NumSamples = 200 i.e total vectors.
I expected results as :
h_SampleData[0][0] = 0
.
.
h_SampleData[0][360] =360
for second vector
h_SampleData[1][0] = 0
.
.
h_SampleData[1][360] = 360
thrid vector as
h_SampleData[2][0] = 0
.
.
h_SampleData[2][360] = 360
and so till vector number 199
i.e
h_SampleData[199][0] = 0
.
.
h_SampleData[199][360] = 360
Kindly help.
Shouldn’t your global space be allocated as 200361, and not 200 100?
Also, what is the value of SAMPLE_LEN?
cirus
September 18, 2009, 4:45pm
3
Shouldn’t your global space be allocated as 200361, and not 200 100?
Also, what is the value of SAMPLE_LEN?
value of SAMPLE_LEN = 19 * 19
and global memory is allocated 200 * 361* sizeof (float) … “200*100” is a typo
Thanks for your visit.
Do you have the same problem if you use a single for loop over k, from 0 to 361? That would be simpler, right?
This line:
d_SampleData[threadIdx.x * SAMPLE_LEN + k] = k;
implies that you are using only one thread block. Is that true?
cirus
September 18, 2009, 7:12pm
6
Exactly.
I want to process several samples in less number of blocks.
For example If I have 4000 samples each of 361 unit length then I aim to process 4000/128 threads per block.
i.e my execution configuration comes out to be
Func<<< ceil(4000/128) , 128>>>(…)
This way I am processing 128 threads per block.
Since in the code posted above, I am using just 200 samples each of 361 I am processing all 200 samples in one block.
Exactly.
I want to process several samples in less number of blocks.
For example If I have 4000 samples each of 361 unit length then I aim to process 4000/128 threads per block.
i.e my execution configuration comes out to be
Func<<< ceil(4000/128) , 128>>>(…)
This way I am processing 128 threads per block.
Since in the code posted above, I am using just 200 samples each of 361 I am processing all 200 samples in one block.
The value “threadIdx.x” is not unique among thread blocks. You should use this line:
d_SampleData[sampleId * SAMPLE_LEN + k] = k;
cirus
September 18, 2009, 8:23pm
8
Tried it but again, it is showing correct result for first vector only that is
d_SampleData[0][0] till d_SampleData[0][360]
For second vector and onwards it shows :
SampleCropImages[1][0] 19.000000 float <b>This should have been started from 0 again . </b> and so on
SampleCropImages[1][1] 20.000000 float <b>This should have been started from 1 again . </b> and so on
SampleCropImages[1][2] 21.000000 float <b>This should have been started from 2 again . </b> and so on
SampleCropImages[1][360] 18.000000 float <b>This should have been started from 360 again . </b> and so on
SampleCropImages[1][285] 304.00000 float <b>This should have been started from 285 again . </b> and so on
SampleCropImages[2][0] 38.000000 float : <b>This should have been started from 0 again . </b> and so on
SampleCropImages[2][1] 39.000000 float: <b>Should be 1.000</b>
SampleCropImages[20][0] 19.000000 float : Should be 0
SampleCropImages[21][0] 38.000000 float
SampleCropImages[22][0] 57.000000 float
SampleCropImages[23][0] 76.000000 float
SampleCropImages[23][1] 77.000000 float
SampleCropImages[23][10] 86.000000 float
SampleCropImages[23][200] 276.00000 float
SampleCropImages[23][300] 15.000000 float
SampleCropImages[23][223] 299.00000 float
SampleCropImages[23][224] 300.00000 float
SampleCropImages[23][270] 346.00000 float
SampleCropImages[23][285] 0.00000000 float
I am using 200 samples for the test above , each of 361 units.
My execution configuration is Func<<<1, 200>>>(…)
Thank you for your time. :)
cirus
September 18, 2009, 9:24pm
10
No effect. Gives same result as posted earlier. :(
cirus
September 19, 2009, 1:09am
11
Solved. Was a mistake, but not in the posted code but after it , that is not posted. Thank you JeremiahPalmer for your kind help and time. :)