Hi.
I have a class that generates a set of objects in an array. For every object, I do a set of operations, so I wanna parallelyze the functions for each cell.
The pronlem is that The size of this array could be of millioins of items O.o!!! I see some samples of managing arrays, and I found something like this
__global__ void worldUpdKernel( MyItems * Darr, int * id )
{
int i = threadIdx.x;
if(Darr[i].marked_for_death()){
id[i]=32+Darr[i].cellID;
}
else{
id[i]=200;
}
}
I notice that they use the ThreadIdX in order to get through the array, and do the different opperations, in this case, see if the condition applies, and giving a value to the array of ints. When I read the value of id[i] in order to check the good behaviour of my kernel, I see that when I reach the element number 255, it starts to return extrange values (random numbers, different than 32 or 200) O.o.
I read that the number of threads when configuring the kernel affects this, and I also read that the maximun number of thread is 512… Making test, I see that when I put 512 (or more) it stucks in the element 255, but when I put 256, it stucks in 127, if I put 128 in 63…etc (the half)
So…how could I manage arrays of millions of elements?
Thanks for your help :)
Edit: I tried to change the block values in the parameters of the kernel, but it doesn’t change the results…what is the utility of that parameter?
myKernel <<<numBlocks,threadsPerBlock>>>
If you only make use of threadIdx.x, then you are only looking at the thread id within that block .
To get your overall thread id, you need to take into account the blockId as well.
Something like
const int idx = (blockIdx.y * blockDim.x * gridDim.x) + blockIdx.x * blockDim.x + threadIdx.x;
Ailleur:
If you only make use of threadIdx.x, then you are only looking at the thread id within that block .
To get your overall thread id, you need to take into account the blockId as well.
Something like
const int idx = (blockIdx.y * blockDim.x * gridDim.x) + blockIdx.x * blockDim.x + threadIdx.x;
Thank you very much!! :D I will try it an I’ll come back with news
Well, it looks like it didn’t work fine for me :( I changed the kernel to make it more simple
Edit: I changed my kernel again…now it picks the (block*Dim) + thread, my kernels have 512 threads and X blocks depending on the size of my array.
__global__ void worldUpdKernel( MyItems * Darr, int * id )
{
int globalIdx = ((blockIdx.x))+threadIdx.x;
id[i]=i;
}
When I put a printf out of the kernel, showing the values for *id, I obtain this kind of results
iresult[0]= 0
iresult[1]= 2
iresult[2]= 4
iresult[3]= 6
iresult[4]= 8
iresult[5]= 10
iresult[6]= 12
iresult[7]= 14
iresult[8]= 16
iresult[9]= 18
iresult[10]= 20
iresult[11]= 22
iresult[12]= 24
iresult[13]= 26
iresult[14]= 28
iresult[15]= 30
iresult[16]= 32
iresult[17]= 34
iresult[18]= 36
iresult[19]= 38
iresult[20]= 40
iresult[21]= 42
iresult[22]= 44
iresult[23]= 46
iresult[24]= 48
iresult[25]= 50
iresult[26]= 52
iresult[27]= 54
iresult[28]= 56
iresult[29]= 58
iresult[30]= 60
iresult[31]= 62
iresult[32]= 64
iresult[33]= 66
iresult[34]= 68
iresult[35]= 70
iresult[36]= 72
iresult[37]= 74
iresult[38]= 76
iresult[39]= 78
iresult[40]= 80
iresult[41]= 82
iresult[42]= 84
iresult[43]= 86
iresult[44]= 88
iresult[45]= 90
iresult[46]= 92
iresult[47]= 94
iresult[48]= 96
iresult[49]= 98
iresult[50]= 100
iresult[51]= 102
iresult[52]= 104
iresult[53]= 106
iresult[54]= 108
iresult[55]= 110
iresult[56]= 112
iresult[57]= 114
iresult[58]= 116
iresult[59]= 118
iresult[60]= 120
iresult[61]= 122
iresult[62]= 124
iresult[63]= 126
iresult[64]= 128
iresult[65]= 130
iresult[66]= 132
iresult[67]= 134
iresult[68]= 136
iresult[69]= 138
iresult[70]= 140
iresult[71]= 142
iresult[72]= 144
iresult[73]= 146
iresult[74]= 148
iresult[75]= 150
iresult[76]= 152
iresult[77]= 154
iresult[78]= 156
iresult[79]= 158
iresult[80]= 160
iresult[81]= 162
iresult[82]= 164
iresult[83]= 166
iresult[84]= 168
iresult[85]= 170
iresult[86]= 172
iresult[87]= 174
iresult[88]= 176
iresult[89]= 178
iresult[90]= 180
iresult[91]= 182
iresult[92]= 184
iresult[93]= 186
iresult[94]= 188
iresult[95]= 190
iresult[96]= 192
iresult[97]= 194
iresult[98]= 196
iresult[99]= 198
iresult[100]= 200
iresult[101]= 202
iresult[102]= 204
iresult[103]= 206
iresult[104]= 208
iresult[105]= 210
iresult[106]= 212
iresult[107]= 214
iresult[108]= 216
iresult[109]= 218
iresult[110]= 220
iresult[111]= 222
iresult[112]= 224
iresult[113]= 226
iresult[114]= 228
iresult[115]= 230
iresult[116]= 232
iresult[117]= 234
iresult[118]= 236
iresult[119]= 238
iresult[120]= 240
iresult[121]= 242
iresult[122]= 244
iresult[123]= 246
iresult[124]= 248
iresult[125]= 250
iresult[126]= 252
iresult[127]= 254
iresult[128]= 256
iresult[129]= 258
iresult[130]= 260
iresult[131]= 262
iresult[132]= 264
iresult[133]= 266
iresult[134]= 268
iresult[135]= 270
iresult[136]= 272
iresult[137]= 274
iresult[138]= 276
iresult[139]= 278
iresult[140]= 280
iresult[141]= 282
iresult[142]= 284
iresult[143]= 286
iresult[144]= 288
iresult[145]= 290
iresult[146]= 292
iresult[147]= 294
iresult[148]= 296
iresult[149]= 298
iresult[150]= 300
iresult[151]= 302
iresult[152]= 304
iresult[153]= 306
iresult[154]= 308
iresult[155]= 310
iresult[156]= 312
iresult[157]= 314
iresult[158]= 316
iresult[159]= 318
iresult[160]= 320
iresult[161]= 322
iresult[162]= 324
iresult[163]= 326
iresult[164]= 328
iresult[165]= 330
iresult[166]= 332
iresult[167]= 334
iresult[168]= 336
iresult[169]= 338
iresult[170]= 340
iresult[171]= 342
iresult[172]= 344
iresult[173]= 346
iresult[174]= 348
iresult[175]= 350
iresult[176]= 352
iresult[177]= 354
iresult[178]= 356
iresult[179]= 358
iresult[180]= 360
iresult[181]= 362
iresult[182]= 364
iresult[183]= 366
iresult[184]= 368
iresult[185]= 370
iresult[186]= 372
iresult[187]= 374
iresult[188]= 376
iresult[189]= 378
iresult[190]= 380
iresult[191]= 382
iresult[192]= 384
iresult[193]= 386
iresult[194]= 388
iresult[195]= 390
iresult[196]= 392
iresult[197]= 394
iresult[198]= 396
iresult[199]= 398
iresult[200]= 400
iresult[201]= 402
iresult[202]= 404
iresult[203]= 406
iresult[204]= 408
iresult[205]= 410
iresult[206]= 412
iresult[207]= 414
iresult[208]= 416
iresult[209]= 418
iresult[210]= 420
iresult[211]= 422
iresult[212]= 424
iresult[213]= 426
iresult[214]= 428
iresult[215]= 430
iresult[216]= 432
iresult[217]= 434
iresult[218]= 436
iresult[219]= 438
iresult[220]= 440
iresult[221]= 442
iresult[222]= 444
iresult[223]= 446
iresult[224]= 448
iresult[225]= 450
iresult[226]= 452
iresult[227]= 454
iresult[228]= 456
iresult[229]= 458
iresult[230]= 460
iresult[231]= 462
iresult[232]= 464
iresult[233]= 466
iresult[234]= 468
iresult[235]= 470
iresult[236]= 472
iresult[237]= 474
iresult[238]= 476
iresult[239]= 478
iresult[240]= 480
iresult[241]= 482
iresult[242]= 484
iresult[243]= 486
iresult[244]= 488
iresult[245]= 490
iresult[246]= 492
iresult[247]= 494
iresult[248]= 496
iresult[249]= 498
iresult[250]= 500
iresult[251]= 502
iresult[252]= 504
iresult[253]= 506
iresult[254]= 508
iresult[255]= 510
iresult[256]= 0
iresult[257]= 1
iresult[258]= 0
iresult[259]= 0
iresult[260]= -255
iresult[261]= 0
iresult[262]= 0
iresult[263]= -102195199
The results are the double of my index (Why? Idont know :S) but when I reach the value that should be the idthread 512 (Begening a new block) it starts to put strange numbers :S
I want that, in the kernel, my index “i” could be use to go through my array, in order to acces to every element, and make a set of operations for every one. Something like This
__global__ void worldUpdKernel( MyCell * Darr, bool * bolAr, int * id )
{
int i =(blockIdx.y * blockDim.x * gridDim.x) + blockIdx.x * blockDim.x + threadIdx.x;
if(Darr[i].isMarked_for_death()){
id[i]=i;
Darr[i].updateValues();
}
else{
id[i]=200;
Darr[i].danceFlamenco();
}*/
}
Thank you
((blockIdx.x))+threadIdx.x does not make sense.
Think of it as manually indexing a 2d array that is flat in memory.
Assuming a 1d thread block dimension here for simplicity. For every block, you have blockDim.x threads. Thus, if blockIdx.x is 1 and threadIdx is 0, you are NOT at threadId (1 + 0) but rather at thread (1*blockDim.x + 0).
The first piece of code you have posted in your last message is definitely not something that compiles (‘i’ does not exist), so hard to say why you get the output you get.
Another thing that you will need to be careful with is to always check that you are not writing passed the bounds of your array. Say you have 500 items in your array, and thread blocks of 128 threads, you will need to launch 4 blocks, which equals 512 theard total. Those last 12 threads don’t have any data to work on, and if they try to access the array, they will be out of bounds.
In pretty much every kernel of every cuda project, you will find that somewhere near the first line of the kernel you have something that looks like:
__global__ myKernel(..., const int numItems)
{
int idx =(blockIdx.y * blockDim.x * gridDim.x) + blockIdx.x * blockDim.x + threadIdx.x;
if(idx >= numItems)
{
return;
}
}
Hope this helps!
Ailleur:
((blockIdx.x))+threadIdx.x does not make sense.
Think of it as manually indexing a 2d array that is flat in memory.
Assuming a 1d thread block dimension here for simplicity. For every block, you have blockDim.x threads. Thus, if blockIdx.x is 1 and threadIdx is 0, you are NOT at threadId (1 + 0) but rather at thread (1*blockDim.x + 0).
I try a lot of things, but allways when I’m looking for the first element of the second block, it gets crazy.
I edit it several times introducing the latest version of my kernel. The first times got “i” as name, but when I changed the way of calculating that index I also change the name :P (Of course is ok in my code)
Ailleur:
Another thing that you will need to be careful with is to always check that you are not writing passed the bounds of your array. Say you have 500 items in your array, and thread blocks of 128 threads, you will need to launch 4 blocks, which equals 512 theard total. Those last 12 threads don’t have any data to work on, and if they try to access the array, they will be out of bounds.
In pretty much every kernel of every cuda project, you will find that somewhere near the first line of the kernel you have something that looks like:
__global__ myKernel(..., const int numItems)
{
int idx =(blockIdx.y * blockDim.x * gridDim.x) + blockIdx.x * blockDim.x + threadIdx.x;
if(idx >= numItems)
{
return;
}
}
Hope this helps!
HI!!!
I can have arround 2 millions of elements in my array, but don’t worry, the number of blocks is dinamic in function of the number of elements in my array :) (The number of threads are allways 512)
So you suggest that way to calculating the index? (idx) I tried and this are the results (I also put again the new version of my kernel, jejeje ;)
Kernel;)
__global__ void worldUpdKernel( MyCell * Darr, bool * bolAr, int * id )
{
int globalIdx = (blockIdx.y * blockDim.x * gridDim.x) + blockIdx.x * blockDim.x + threadIdx.x;
id[globalIdx]=globalIdx;
}
Output
iresult[0]= 0
iresult[1]= 2
iresult[2]= 4
iresult[3]= 6
iresult[4]= 8
iresult[5]= 10
iresult[6]= 12
iresult[7]= 14
iresult[8]= 16
iresult[9]= 18
iresult[10]= 20
iresult[11]= 22
iresult[12]= 24
iresult[13]= 26
iresult[14]= 28
iresult[15]= 30
iresult[16]= 32
iresult[17]= 34
iresult[18]= 36
iresult[19]= 38
iresult[20]= 40
iresult[21]= 42
iresult[22]= 44
iresult[23]= 46
iresult[24]= 48
iresult[25]= 50
iresult[26]= 52
iresult[27]= 54
iresult[28]= 56
iresult[29]= 58
iresult[30]= 60
iresult[31]= 62
iresult[32]= 64
iresult[33]= 66
iresult[34]= 68
iresult[35]= 70
iresult[36]= 72
iresult[37]= 74
iresult[38]= 76
iresult[39]= 78
iresult[40]= 80
iresult[41]= 82
iresult[42]= 84
iresult[43]= 86
iresult[44]= 88
iresult[45]= 90
iresult[46]= 92
iresult[47]= 94
iresult[48]= 96
iresult[49]= 98
iresult[50]= 100
iresult[51]= 102
iresult[52]= 104
iresult[53]= 106
iresult[54]= 108
iresult[55]= 110
iresult[56]= 112
iresult[57]= 114
iresult[58]= 116
iresult[59]= 118
iresult[60]= 120
iresult[61]= 122
iresult[62]= 124
iresult[63]= 126
iresult[64]= 128
iresult[65]= 130
iresult[66]= 132
iresult[67]= 134
iresult[68]= 136
iresult[69]= 138
iresult[70]= 140
iresult[71]= 142
iresult[72]= 144
iresult[73]= 146
iresult[74]= 148
iresult[75]= 150
iresult[76]= 152
iresult[77]= 154
iresult[78]= 156
iresult[79]= 158
iresult[80]= 160
iresult[81]= 162
iresult[82]= 164
iresult[83]= 166
iresult[84]= 168
iresult[85]= 170
iresult[86]= 172
iresult[87]= 174
iresult[88]= 176
iresult[89]= 178
iresult[90]= 180
iresult[91]= 182
iresult[92]= 184
iresult[93]= 186
iresult[94]= 188
iresult[95]= 190
iresult[96]= 192
iresult[97]= 194
iresult[98]= 196
iresult[99]= 198
iresult[100]= 200
iresult[101]= 202
iresult[102]= 204
iresult[103]= 206
iresult[104]= 208
iresult[105]= 210
iresult[106]= 212
iresult[107]= 214
iresult[108]= 216
iresult[109]= 218
iresult[110]= 220
iresult[111]= 222
iresult[112]= 224
iresult[113]= 226
iresult[114]= 228
iresult[115]= 230
iresult[116]= 232
iresult[117]= 234
iresult[118]= 236
iresult[119]= 238
iresult[120]= 240
iresult[121]= 242
iresult[122]= 244
iresult[123]= 246
iresult[124]= 248
iresult[125]= 250
iresult[126]= 252
iresult[127]= 254
iresult[128]= 256
iresult[129]= 258
iresult[130]= 260
iresult[131]= 262
iresult[132]= 264
iresult[133]= 266
iresult[134]= 268
iresult[135]= 270
iresult[136]= 272
iresult[137]= 274
iresult[138]= 276
iresult[139]= 278
iresult[140]= 280
iresult[141]= 282
iresult[142]= 284
iresult[143]= 286
iresult[144]= 288
iresult[145]= 290
iresult[146]= 292
iresult[147]= 294
iresult[148]= 296
iresult[149]= 298
iresult[150]= 300
iresult[151]= 302
iresult[152]= 304
iresult[153]= 306
iresult[154]= 308
iresult[155]= 310
iresult[156]= 312
iresult[157]= 314
iresult[158]= 316
iresult[159]= 318
iresult[160]= 320
iresult[161]= 322
iresult[162]= 324
iresult[163]= 326
iresult[164]= 328
iresult[165]= 330
iresult[166]= 332
iresult[167]= 334
iresult[168]= 336
iresult[169]= 338
iresult[170]= 340
iresult[171]= 342
iresult[172]= 344
iresult[173]= 346
iresult[174]= 348
iresult[175]= 350
iresult[176]= 352
iresult[177]= 354
iresult[178]= 356
iresult[179]= 358
iresult[180]= 360
iresult[181]= 362
iresult[182]= 364
iresult[183]= 366
iresult[184]= 368
iresult[185]= 370
iresult[186]= 372
iresult[187]= 374
iresult[188]= 376
iresult[189]= 378
iresult[190]= 380
iresult[191]= 382
iresult[192]= 384
iresult[193]= 386
iresult[194]= 388
iresult[195]= 390
iresult[196]= 392
iresult[197]= 394
iresult[198]= 396
iresult[199]= 398
iresult[200]= 400
iresult[201]= 402
iresult[202]= 404
iresult[203]= 406
iresult[204]= 408
iresult[205]= 410
iresult[206]= 412
iresult[207]= 414
iresult[208]= 416
iresult[209]= 418
iresult[210]= 420
iresult[211]= 422
iresult[212]= 424
iresult[213]= 426
iresult[214]= 428
iresult[215]= 430
iresult[216]= 432
iresult[217]= 434
iresult[218]= 436
iresult[219]= 438
iresult[220]= 440
iresult[221]= 442
iresult[222]= 444
iresult[223]= 446
iresult[224]= 448
iresult[225]= 450
iresult[226]= 452
iresult[227]= 454
iresult[228]= 456
iresult[229]= 458
iresult[230]= 460
iresult[231]= 462
iresult[232]= 464
iresult[233]= 466
iresult[234]= 468
iresult[235]= 470
iresult[236]= 472
iresult[237]= 474
iresult[238]= 476
iresult[239]= 478
iresult[240]= 480
iresult[241]= 482
iresult[242]= 484
iresult[243]= 486
iresult[244]= 488
iresult[245]= 490
iresult[246]= 492
iresult[247]= 494
iresult[248]= 496
iresult[249]= 498
iresult[250]= 500
iresult[251]= 502
iresult[252]= 504
iresult[253]= 506
iresult[254]= 508
iresult[255]= 510
iresult[256]= 512
iresult[257]= 514
iresult[258]= 516
iresult[259]= 518
iresult[260]= 520
iresult[261]= 522
iresult[262]= 524
iresult[263]= 526
iresult[264]= 528
iresult[265]= 530
iresult[266]= 532
iresult[267]= 534
iresult[268]= 536
iresult[269]= 538
iresult[270]= 540
iresult[271]= 542
iresult[272]= 544
iresult[273]= 546
iresult[274]= 548
iresult[275]= 550
iresult[276]= 552
iresult[277]= 554
iresult[278]= 556
iresult[279]= 558
iresult[280]= 560
iresult[281]= 562
iresult[282]= 564
iresult[283]= 566
iresult[284]= 568
iresult[285]= 570
iresult[286]= 572
iresult[287]= 574
iresult[288]= 576
iresult[289]= 578
iresult[290]= 580
iresult[291]= 582
iresult[292]= 584
iresult[293]= 586
iresult[294]= 588
iresult[295]= 590
iresult[296]= 592
iresult[297]= 594
iresult[298]= 596
iresult[299]= 598
iresult[300]= 600
iresult[301]= 602
iresult[302]= 604
iresult[303]= 606
iresult[304]= 608
iresult[305]= 610
iresult[306]= 612
iresult[307]= 614
iresult[308]= 616
iresult[309]= 618
iresult[310]= 620
iresult[311]= 622
iresult[312]= 624
iresult[313]= 626
iresult[314]= 628
iresult[315]= 630
iresult[316]= 632
iresult[317]= 634
iresult[318]= 636
iresult[319]= 638
iresult[320]= 640
iresult[321]= 642
iresult[322]= 644
iresult[323]= 646
iresult[324]= 648
iresult[325]= 650
iresult[326]= 652
iresult[327]= 654
iresult[328]= 656
iresult[329]= 658
iresult[330]= 660
iresult[331]= 662
iresult[332]= 664
iresult[333]= 666
iresult[334]= 668
iresult[335]= 670
iresult[336]= 672
iresult[337]= 674
iresult[338]= 676
iresult[339]= 678
iresult[340]= 680
iresult[341]= 682
iresult[342]= 684
iresult[343]= 686
iresult[344]= 688
iresult[345]= 690
iresult[346]= 692
iresult[347]= 694
iresult[348]= 696
iresult[349]= 698
iresult[350]= 700
iresult[351]= 702
iresult[352]= 704
iresult[353]= 706
iresult[354]= 708
iresult[355]= 710
iresult[356]= 712
iresult[357]= 714
iresult[358]= 716
iresult[359]= 718
iresult[360]= 720
iresult[361]= 722
iresult[362]= 724
iresult[363]= 726
iresult[364]= 728
iresult[365]= 730
iresult[366]= 732
iresult[367]= 734
iresult[368]= 736
iresult[369]= 738
iresult[370]= 740
iresult[371]= 742
iresult[372]= 744
iresult[373]= 746
iresult[374]= 748
iresult[375]= 710
iresult[376]= 712
iresult[377]= 714
iresult[378]= 716
iresult[379]= 718
iresult[380]= 720
iresult[381]= 722
iresult[382]= 724
iresult[383]= 726
iresult[384]= 728
iresult[385]= 730
iresult[386]= 732
iresult[387]= 734
iresult[388]= 736
iresult[389]= 738
iresult[390]= 740
iresult[391]= 742
iresult[392]= 728
iresult[393]= 730
iresult[394]= 732
iresult[395]= 734
iresult[396]= 736
iresult[397]= 738
iresult[398]= 740
iresult[399]= 730
iresult[400]= 732
iresult[401]= 734
iresult[402]= 736
iresult[403]= 738
iresult[404]= 724
iresult[405]= 726
iresult[406]= 728
iresult[407]= 730
iresult[408]= 732
iresult[409]= 734
iresult[410]= 736
iresult[411]= 726
iresult[412]= 728
iresult[413]= 730
iresult[414]= 732
iresult[415]= 734
iresult[416]= 724
iresult[417]= 726
iresult[418]= 728
iresult[419]= 730
iresult[420]= 732
iresult[421]= 718
iresult[422]= 720
iresult[423]= 722
iresult[424]= 724
iresult[425]= 726
iresult[426]= 728
iresult[427]= 730
iresult[428]= 716
iresult[429]= 718
iresult[430]= 720
iresult[431]= 722
iresult[432]= 724
iresult[433]= 726
iresult[434]= 728
iresult[435]= 718
iresult[436]= 720
iresult[437]= 722
iresult[438]= 724
iresult[439]= 726
iresult[440]= 700
iresult[441]= 702
iresult[442]= 704
iresult[443]= 706
iresult[444]= 708
iresult[445]= 710
iresult[446]= 712
iresult[447]= 714
iresult[448]= 716
iresult[449]= 718
iresult[450]= 720
iresult[451]= 722
iresult[452]= 712
iresult[453]= 714
iresult[454]= 716
iresult[455]= 718
iresult[456]= 720
iresult[457]= 706
iresult[458]= 708
iresult[459]= 710
iresult[460]= 712
iresult[461]= 714
iresult[462]= 716
iresult[463]= 718
iresult[464]= 704
iresult[465]= 706
iresult[466]= 708
iresult[467]= 710
iresult[468]= 712
iresult[469]= 714
iresult[470]= 716
iresult[471]= 706
iresult[472]= 708
iresult[473]= 710
iresult[474]= 712
iresult[475]= 714
iresult[476]= 704
iresult[477]= 706
iresult[478]= 708
iresult[479]= 710
iresult[480]= 712
iresult[481]= 686
iresult[482]= 688
iresult[483]= 690
iresult[484]= 692
iresult[485]= 694
iresult[486]= 696
iresult[487]= 698
iresult[488]= 700
iresult[489]= 702
iresult[490]= 704
iresult[491]= 706
iresult[492]= 708
iresult[493]= 694
iresult[494]= 696
iresult[495]= 698
iresult[496]= 9
iresult[497]= 2
iresult[498]= 0
iresult[499]= 176
iresult[500]= 0
iresult[501]= 0
iresult[502]= 0
iresult[503]= 9
iresult[504]= 91
iresult[505]= 119
iresult[506]= 39
iresult[507]= 0
iresult[508]= 1018929152
And the number of elements in this array was arround 749 elements (I don’t put the rest of the output, just the part when it starts going crazy)
You shouldn’t be using both .x or .y stuff unless you have a 2D array and even then, it’s easier to just write it all like a 1D array which is what I guess the blockDim.x and .y variables do.
Here’s a better example :
thrust::device_vector<int> arr;
arr.reserve(2000000); // 2 million
// initialize arr (make sure to use push_back to change size)
int tpb = 32; // threads per block
int blocks = arr.size() / tpb + (arr.size() % tpb ? 1 : 0);
kernel<<< blocks, tpb >>>(thrust::raw_pointer_cast(arr.data()), arr.size());
/* ... */
__global__
void kernel(int *arr, int n) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
if (i >= n) // check if i is out of array bounds
return;
arr[i] = /* ... */
return;
}
Hi.
I already got this working :) Thank you very much to everyone for your feedback :)