Kernel configuration: management of an array of 2 million positions

Hi.

I have a class that generates a set of objects in an array. For every object, I do a set of operations, so I wanna parallelyze the functions for each cell.

The pronlem is that The size of this array could be of millioins of items O.o!!! I see some samples of managing arrays, and I found something like this

__global__ void worldUpdKernel( MyItems * Darr,  int * id )
{
    int i = threadIdx.x;

    if(Darr[i].marked_for_death()){
    
        id[i]=32+Darr[i].cellID;
    }
    else{
        id[i]=200;
    }
}

I notice that they use the ThreadIdX in order to get through the array, and do the different opperations, in this case, see if the condition applies, and giving a value to the array of ints. When I read the value of id[i] in order to check the good behaviour of my kernel, I see that when I reach the element number 255, it starts to return extrange values (random numbers, different than 32 or 200) O.o.

I read that the number of threads when configuring the kernel affects this, and I also read that the maximun number of thread is 512… Making test, I see that when I put 512 (or more) it stucks in the element 255, but when I put 256, it stucks in 127, if I put 128 in 63…etc (the half)

So…how could I manage arrays of millions of elements?

Thanks for your help :)

Edit: I tried to change the block values in the parameters of the kernel, but it doesn’t change the results…what is the utility of that parameter?

myKernel <<<numBlocks,threadsPerBlock>>>

If you only make use of threadIdx.x, then you are only looking at the thread id within that block.
To get your overall thread id, you need to take into account the blockId as well.

Something like

const int idx = (blockIdx.y * blockDim.x * gridDim.x) + blockIdx.x * blockDim.x + threadIdx.x;

Thank you very much!! :D I will try it an I’ll come back with news

Well, it looks like it didn’t work fine for me :( I changed the kernel to make it more simple

Edit: I changed my kernel again…now it picks the (block*Dim) + thread, my kernels have 512 threads and X blocks depending on the size of my array.

__global__ void worldUpdKernel( MyItems * Darr, int * id )
    {
    int globalIdx = ((blockIdx.x))+threadIdx.x;
    id[i]=i;
    
    }

When I put a printf out of the kernel, showing the values for *id, I obtain this kind of results

iresult[0]= 0
  iresult[1]= 2
  iresult[2]= 4
  iresult[3]= 6
  iresult[4]= 8
  iresult[5]= 10
  iresult[6]= 12
  iresult[7]= 14
  iresult[8]= 16
  iresult[9]= 18
  iresult[10]= 20
  iresult[11]= 22
  iresult[12]= 24
  iresult[13]= 26
  iresult[14]= 28
  iresult[15]= 30
  iresult[16]= 32
  iresult[17]= 34
  iresult[18]= 36
  iresult[19]= 38
  iresult[20]= 40
  iresult[21]= 42
  iresult[22]= 44
  iresult[23]= 46
  iresult[24]= 48
  iresult[25]= 50
  iresult[26]= 52
  iresult[27]= 54
  iresult[28]= 56
  iresult[29]= 58
  iresult[30]= 60
  iresult[31]= 62
  iresult[32]= 64
  iresult[33]= 66
  iresult[34]= 68
  iresult[35]= 70
  iresult[36]= 72
  iresult[37]= 74
  iresult[38]= 76
  iresult[39]= 78
  iresult[40]= 80
  iresult[41]= 82
  iresult[42]= 84
  iresult[43]= 86
  iresult[44]= 88
  iresult[45]= 90
  iresult[46]= 92
  iresult[47]= 94
  iresult[48]= 96
  iresult[49]= 98
  iresult[50]= 100
  iresult[51]= 102
  iresult[52]= 104
  iresult[53]= 106
  iresult[54]= 108
  iresult[55]= 110
  iresult[56]= 112
  iresult[57]= 114
  iresult[58]= 116
  iresult[59]= 118
  iresult[60]= 120
  iresult[61]= 122
  iresult[62]= 124
  iresult[63]= 126
  iresult[64]= 128
  iresult[65]= 130
  iresult[66]= 132
  iresult[67]= 134
  iresult[68]= 136
  iresult[69]= 138
  iresult[70]= 140
  iresult[71]= 142
  iresult[72]= 144
  iresult[73]= 146
  iresult[74]= 148
  iresult[75]= 150
  iresult[76]= 152
  iresult[77]= 154
  iresult[78]= 156
  iresult[79]= 158
  iresult[80]= 160
  iresult[81]= 162
  iresult[82]= 164
  iresult[83]= 166
  iresult[84]= 168
  iresult[85]= 170
  iresult[86]= 172
  iresult[87]= 174
  iresult[88]= 176
  iresult[89]= 178
  iresult[90]= 180
  iresult[91]= 182
  iresult[92]= 184
  iresult[93]= 186
  iresult[94]= 188
  iresult[95]= 190
  iresult[96]= 192
  iresult[97]= 194
  iresult[98]= 196
  iresult[99]= 198
  iresult[100]= 200
  iresult[101]= 202
  iresult[102]= 204
  iresult[103]= 206
  iresult[104]= 208
  iresult[105]= 210
  iresult[106]= 212
  iresult[107]= 214
  iresult[108]= 216
  iresult[109]= 218
  iresult[110]= 220
  iresult[111]= 222
  iresult[112]= 224
  iresult[113]= 226
  iresult[114]= 228
  iresult[115]= 230
  iresult[116]= 232
  iresult[117]= 234
  iresult[118]= 236
  iresult[119]= 238
  iresult[120]= 240
  iresult[121]= 242
  iresult[122]= 244
  iresult[123]= 246
  iresult[124]= 248
  iresult[125]= 250
  iresult[126]= 252
  iresult[127]= 254
  iresult[128]= 256
  iresult[129]= 258
  iresult[130]= 260
  iresult[131]= 262
  iresult[132]= 264
  iresult[133]= 266
  iresult[134]= 268
  iresult[135]= 270
  iresult[136]= 272
  iresult[137]= 274
  iresult[138]= 276
  iresult[139]= 278
  iresult[140]= 280
  iresult[141]= 282
  iresult[142]= 284
  iresult[143]= 286
  iresult[144]= 288
  iresult[145]= 290
  iresult[146]= 292
  iresult[147]= 294
  iresult[148]= 296
  iresult[149]= 298
  iresult[150]= 300
  iresult[151]= 302
  iresult[152]= 304
  iresult[153]= 306
  iresult[154]= 308
  iresult[155]= 310
  iresult[156]= 312
  iresult[157]= 314
  iresult[158]= 316
  iresult[159]= 318
  iresult[160]= 320
  iresult[161]= 322
  iresult[162]= 324
  iresult[163]= 326
  iresult[164]= 328
  iresult[165]= 330
  iresult[166]= 332
  iresult[167]= 334
  iresult[168]= 336
  iresult[169]= 338
  iresult[170]= 340
  iresult[171]= 342
  iresult[172]= 344
  iresult[173]= 346
  iresult[174]= 348
  iresult[175]= 350
  iresult[176]= 352
  iresult[177]= 354
  iresult[178]= 356
  iresult[179]= 358
  iresult[180]= 360
  iresult[181]= 362
  iresult[182]= 364
  iresult[183]= 366
  iresult[184]= 368
  iresult[185]= 370
  iresult[186]= 372
  iresult[187]= 374
  iresult[188]= 376
  iresult[189]= 378
  iresult[190]= 380
  iresult[191]= 382
  iresult[192]= 384
  iresult[193]= 386
  iresult[194]= 388
  iresult[195]= 390
  iresult[196]= 392
  iresult[197]= 394
  iresult[198]= 396
  iresult[199]= 398
  iresult[200]= 400
  iresult[201]= 402
  iresult[202]= 404
  iresult[203]= 406
  iresult[204]= 408
  iresult[205]= 410
  iresult[206]= 412
  iresult[207]= 414
  iresult[208]= 416
  iresult[209]= 418
  iresult[210]= 420
  iresult[211]= 422
  iresult[212]= 424
  iresult[213]= 426
  iresult[214]= 428
  iresult[215]= 430
  iresult[216]= 432
  iresult[217]= 434
  iresult[218]= 436
  iresult[219]= 438
  iresult[220]= 440
  iresult[221]= 442
  iresult[222]= 444
  iresult[223]= 446
  iresult[224]= 448
  iresult[225]= 450
  iresult[226]= 452
  iresult[227]= 454
  iresult[228]= 456
  iresult[229]= 458
  iresult[230]= 460
  iresult[231]= 462
  iresult[232]= 464
  iresult[233]= 466
  iresult[234]= 468
  iresult[235]= 470
  iresult[236]= 472
  iresult[237]= 474
  iresult[238]= 476
  iresult[239]= 478
  iresult[240]= 480
  iresult[241]= 482
  iresult[242]= 484
  iresult[243]= 486
  iresult[244]= 488
  iresult[245]= 490
  iresult[246]= 492
  iresult[247]= 494
  iresult[248]= 496
  iresult[249]= 498
  iresult[250]= 500
  iresult[251]= 502
  iresult[252]= 504
  iresult[253]= 506
  iresult[254]= 508
  iresult[255]= 510
  iresult[256]= 0
  iresult[257]= 1
  iresult[258]= 0
  iresult[259]= 0
  iresult[260]= -255
  iresult[261]= 0
  iresult[262]= 0
  iresult[263]= -102195199

The results are the double of my index (Why? Idont know :S) but when I reach the value that should be the idthread 512 (Begening a new block) it starts to put strange numbers :S

I want that, in the kernel, my index “i” could be use to go through my array, in order to acces to every element, and make a set of operations for every one. Something like This

__global__ void worldUpdKernel( MyCell * Darr, bool * bolAr, int * id )
{
    int i =(blockIdx.y * blockDim.x * gridDim.x) + blockIdx.x * blockDim.x + threadIdx.x;
    if(Darr[i].isMarked_for_death()){
        id[i]=i;
        Darr[i].updateValues();
    }
    else{
        id[i]=200;   
        Darr[i].danceFlamenco();
 }*/
}

Thank you

((blockIdx.x))+threadIdx.x does not make sense.
Think of it as manually indexing a 2d array that is flat in memory.

Assuming a 1d thread block dimension here for simplicity. For every block, you have blockDim.x threads. Thus, if blockIdx.x is 1 and threadIdx is 0, you are NOT at threadId (1 + 0) but rather at thread (1*blockDim.x + 0).

The first piece of code you have posted in your last message is definitely not something that compiles (‘i’ does not exist), so hard to say why you get the output you get.

Another thing that you will need to be careful with is to always check that you are not writing passed the bounds of your array. Say you have 500 items in your array, and thread blocks of 128 threads, you will need to launch 4 blocks, which equals 512 theard total. Those last 12 threads don’t have any data to work on, and if they try to access the array, they will be out of bounds.
In pretty much every kernel of every cuda project, you will find that somewhere near the first line of the kernel you have something that looks like:

__global__ myKernel(..., const int numItems)
{
      int idx =(blockIdx.y * blockDim.x * gridDim.x) + blockIdx.x * blockDim.x + threadIdx.x;
      if(idx >= numItems)
      {
             return;
      }
}

Hope this helps!

I try a lot of things, but allways when I’m looking for the first element of the second block, it gets crazy.

I edit it several times introducing the latest version of my kernel. The first times got “i” as name, but when I changed the way of calculating that index I also change the name :P (Of course is ok in my code)

HI!!!

I can have arround 2 millions of elements in my array, but don’t worry, the number of blocks is dinamic in function of the number of elements in my array :) (The number of threads are allways 512)

So you suggest that way to calculating the index? (idx) I tried and this are the results (I also put again the new version of my kernel, jejeje ;)

Kernel;)

__global__ void worldUpdKernel( MyCell * Darr, bool * bolAr, int * id )
{
     int globalIdx = (blockIdx.y * blockDim.x * gridDim.x) + blockIdx.x * blockDim.x + threadIdx.x;
     id[globalIdx]=globalIdx;
 
}

Output

iresult[0]= 0
  iresult[1]= 2
  iresult[2]= 4
  iresult[3]= 6
  iresult[4]= 8
  iresult[5]= 10
  iresult[6]= 12
  iresult[7]= 14
  iresult[8]= 16
  iresult[9]= 18
  iresult[10]= 20
  iresult[11]= 22
  iresult[12]= 24
  iresult[13]= 26
  iresult[14]= 28
  iresult[15]= 30
  iresult[16]= 32
  iresult[17]= 34
  iresult[18]= 36
  iresult[19]= 38
  iresult[20]= 40
  iresult[21]= 42
  iresult[22]= 44
  iresult[23]= 46
  iresult[24]= 48
  iresult[25]= 50
  iresult[26]= 52
  iresult[27]= 54
  iresult[28]= 56
  iresult[29]= 58
  iresult[30]= 60
  iresult[31]= 62
  iresult[32]= 64
  iresult[33]= 66
  iresult[34]= 68
  iresult[35]= 70
  iresult[36]= 72
  iresult[37]= 74
  iresult[38]= 76
  iresult[39]= 78
  iresult[40]= 80
  iresult[41]= 82
  iresult[42]= 84
  iresult[43]= 86
  iresult[44]= 88
  iresult[45]= 90
  iresult[46]= 92
  iresult[47]= 94
  iresult[48]= 96
  iresult[49]= 98
  iresult[50]= 100
  iresult[51]= 102
  iresult[52]= 104
  iresult[53]= 106
  iresult[54]= 108
  iresult[55]= 110
  iresult[56]= 112
  iresult[57]= 114
  iresult[58]= 116
  iresult[59]= 118
  iresult[60]= 120
  iresult[61]= 122
  iresult[62]= 124
  iresult[63]= 126
  iresult[64]= 128
  iresult[65]= 130
  iresult[66]= 132
  iresult[67]= 134
  iresult[68]= 136
  iresult[69]= 138
  iresult[70]= 140
  iresult[71]= 142
  iresult[72]= 144
  iresult[73]= 146
  iresult[74]= 148
  iresult[75]= 150
  iresult[76]= 152
  iresult[77]= 154
  iresult[78]= 156
  iresult[79]= 158
  iresult[80]= 160
  iresult[81]= 162
  iresult[82]= 164
  iresult[83]= 166
  iresult[84]= 168
  iresult[85]= 170
  iresult[86]= 172
  iresult[87]= 174
  iresult[88]= 176
  iresult[89]= 178
  iresult[90]= 180
  iresult[91]= 182
  iresult[92]= 184
  iresult[93]= 186
  iresult[94]= 188
  iresult[95]= 190
  iresult[96]= 192
  iresult[97]= 194
  iresult[98]= 196
  iresult[99]= 198
  iresult[100]= 200
  iresult[101]= 202
  iresult[102]= 204
  iresult[103]= 206
  iresult[104]= 208
  iresult[105]= 210
  iresult[106]= 212
  iresult[107]= 214
  iresult[108]= 216
  iresult[109]= 218
  iresult[110]= 220
  iresult[111]= 222
  iresult[112]= 224
  iresult[113]= 226
  iresult[114]= 228
  iresult[115]= 230
  iresult[116]= 232
  iresult[117]= 234
  iresult[118]= 236
  iresult[119]= 238
  iresult[120]= 240
  iresult[121]= 242
  iresult[122]= 244
  iresult[123]= 246
  iresult[124]= 248
  iresult[125]= 250
  iresult[126]= 252
  iresult[127]= 254
  iresult[128]= 256
  iresult[129]= 258
  iresult[130]= 260
  iresult[131]= 262
  iresult[132]= 264
  iresult[133]= 266
  iresult[134]= 268
  iresult[135]= 270
  iresult[136]= 272
  iresult[137]= 274
  iresult[138]= 276
  iresult[139]= 278
  iresult[140]= 280
  iresult[141]= 282
  iresult[142]= 284
  iresult[143]= 286
  iresult[144]= 288
  iresult[145]= 290
  iresult[146]= 292
  iresult[147]= 294
  iresult[148]= 296
  iresult[149]= 298
  iresult[150]= 300
  iresult[151]= 302
  iresult[152]= 304
  iresult[153]= 306
  iresult[154]= 308
  iresult[155]= 310
  iresult[156]= 312
  iresult[157]= 314
  iresult[158]= 316
  iresult[159]= 318
  iresult[160]= 320
  iresult[161]= 322
  iresult[162]= 324
  iresult[163]= 326
  iresult[164]= 328
  iresult[165]= 330
  iresult[166]= 332
  iresult[167]= 334
  iresult[168]= 336
  iresult[169]= 338
  iresult[170]= 340
  iresult[171]= 342
  iresult[172]= 344
  iresult[173]= 346
  iresult[174]= 348
  iresult[175]= 350
  iresult[176]= 352
  iresult[177]= 354
  iresult[178]= 356
  iresult[179]= 358
  iresult[180]= 360
  iresult[181]= 362
  iresult[182]= 364
  iresult[183]= 366
  iresult[184]= 368
  iresult[185]= 370
  iresult[186]= 372
  iresult[187]= 374
  iresult[188]= 376
  iresult[189]= 378
  iresult[190]= 380
  iresult[191]= 382
  iresult[192]= 384
  iresult[193]= 386
  iresult[194]= 388
  iresult[195]= 390
  iresult[196]= 392
  iresult[197]= 394
  iresult[198]= 396
  iresult[199]= 398
  iresult[200]= 400
  iresult[201]= 402
  iresult[202]= 404
  iresult[203]= 406
  iresult[204]= 408
  iresult[205]= 410
  iresult[206]= 412
  iresult[207]= 414
  iresult[208]= 416
  iresult[209]= 418
  iresult[210]= 420
  iresult[211]= 422
  iresult[212]= 424
  iresult[213]= 426
  iresult[214]= 428
  iresult[215]= 430
  iresult[216]= 432
  iresult[217]= 434
  iresult[218]= 436
  iresult[219]= 438
  iresult[220]= 440
  iresult[221]= 442
  iresult[222]= 444
  iresult[223]= 446
  iresult[224]= 448
  iresult[225]= 450
  iresult[226]= 452
  iresult[227]= 454
  iresult[228]= 456
  iresult[229]= 458
  iresult[230]= 460
  iresult[231]= 462
  iresult[232]= 464
  iresult[233]= 466
  iresult[234]= 468
  iresult[235]= 470
  iresult[236]= 472
  iresult[237]= 474
  iresult[238]= 476
  iresult[239]= 478
  iresult[240]= 480
  iresult[241]= 482
  iresult[242]= 484
  iresult[243]= 486
  iresult[244]= 488
  iresult[245]= 490
  iresult[246]= 492
  iresult[247]= 494
  iresult[248]= 496
  iresult[249]= 498
  iresult[250]= 500
  iresult[251]= 502
  iresult[252]= 504
  iresult[253]= 506
  iresult[254]= 508
  iresult[255]= 510
  iresult[256]= 512
  iresult[257]= 514
  iresult[258]= 516
  iresult[259]= 518
  iresult[260]= 520
  iresult[261]= 522
  iresult[262]= 524
  iresult[263]= 526
  iresult[264]= 528
  iresult[265]= 530
  iresult[266]= 532
  iresult[267]= 534
  iresult[268]= 536
  iresult[269]= 538
  iresult[270]= 540
  iresult[271]= 542
  iresult[272]= 544
  iresult[273]= 546
  iresult[274]= 548
  iresult[275]= 550
  iresult[276]= 552
  iresult[277]= 554
  iresult[278]= 556
  iresult[279]= 558
  iresult[280]= 560
  iresult[281]= 562
  iresult[282]= 564
  iresult[283]= 566
  iresult[284]= 568
  iresult[285]= 570
  iresult[286]= 572
  iresult[287]= 574
  iresult[288]= 576
  iresult[289]= 578
  iresult[290]= 580
  iresult[291]= 582
  iresult[292]= 584
  iresult[293]= 586
  iresult[294]= 588
  iresult[295]= 590
  iresult[296]= 592
  iresult[297]= 594
  iresult[298]= 596
  iresult[299]= 598
  iresult[300]= 600
  iresult[301]= 602
  iresult[302]= 604
  iresult[303]= 606
  iresult[304]= 608
  iresult[305]= 610
  iresult[306]= 612
  iresult[307]= 614
  iresult[308]= 616
  iresult[309]= 618
  iresult[310]= 620
  iresult[311]= 622
  iresult[312]= 624
  iresult[313]= 626
  iresult[314]= 628
  iresult[315]= 630
  iresult[316]= 632
  iresult[317]= 634
  iresult[318]= 636
  iresult[319]= 638
  iresult[320]= 640
  iresult[321]= 642
  iresult[322]= 644
  iresult[323]= 646
  iresult[324]= 648
  iresult[325]= 650
  iresult[326]= 652
  iresult[327]= 654
  iresult[328]= 656
  iresult[329]= 658
  iresult[330]= 660
  iresult[331]= 662
  iresult[332]= 664
  iresult[333]= 666
  iresult[334]= 668
  iresult[335]= 670
  iresult[336]= 672
  iresult[337]= 674
  iresult[338]= 676
  iresult[339]= 678
  iresult[340]= 680
  iresult[341]= 682
  iresult[342]= 684
  iresult[343]= 686
  iresult[344]= 688
  iresult[345]= 690
  iresult[346]= 692
  iresult[347]= 694
  iresult[348]= 696
  iresult[349]= 698
  iresult[350]= 700
  iresult[351]= 702
  iresult[352]= 704
  iresult[353]= 706
  iresult[354]= 708
  iresult[355]= 710
  iresult[356]= 712
  iresult[357]= 714
  iresult[358]= 716
  iresult[359]= 718
  iresult[360]= 720
  iresult[361]= 722
  iresult[362]= 724
  iresult[363]= 726
  iresult[364]= 728
  iresult[365]= 730
  iresult[366]= 732
  iresult[367]= 734
  iresult[368]= 736
  iresult[369]= 738
  iresult[370]= 740
  iresult[371]= 742
  iresult[372]= 744
  iresult[373]= 746
  iresult[374]= 748
  iresult[375]= 710
  iresult[376]= 712
  iresult[377]= 714
  iresult[378]= 716
  iresult[379]= 718
  iresult[380]= 720
  iresult[381]= 722
  iresult[382]= 724
  iresult[383]= 726
  iresult[384]= 728
  iresult[385]= 730
  iresult[386]= 732
  iresult[387]= 734
  iresult[388]= 736
  iresult[389]= 738
  iresult[390]= 740
  iresult[391]= 742
  iresult[392]= 728
  iresult[393]= 730
  iresult[394]= 732
  iresult[395]= 734
  iresult[396]= 736
  iresult[397]= 738
  iresult[398]= 740
  iresult[399]= 730
  iresult[400]= 732
  iresult[401]= 734
  iresult[402]= 736
  iresult[403]= 738
  iresult[404]= 724
  iresult[405]= 726
  iresult[406]= 728
  iresult[407]= 730
  iresult[408]= 732
  iresult[409]= 734
  iresult[410]= 736
  iresult[411]= 726
  iresult[412]= 728
  iresult[413]= 730
  iresult[414]= 732
  iresult[415]= 734
  iresult[416]= 724
  iresult[417]= 726
  iresult[418]= 728
  iresult[419]= 730
  iresult[420]= 732
  iresult[421]= 718
  iresult[422]= 720
  iresult[423]= 722
  iresult[424]= 724
  iresult[425]= 726
  iresult[426]= 728
  iresult[427]= 730
  iresult[428]= 716
  iresult[429]= 718
  iresult[430]= 720
  iresult[431]= 722
  iresult[432]= 724
  iresult[433]= 726
  iresult[434]= 728
  iresult[435]= 718
  iresult[436]= 720
  iresult[437]= 722
  iresult[438]= 724
  iresult[439]= 726
  iresult[440]= 700
  iresult[441]= 702
  iresult[442]= 704
  iresult[443]= 706
  iresult[444]= 708
  iresult[445]= 710
  iresult[446]= 712
  iresult[447]= 714
  iresult[448]= 716
  iresult[449]= 718
  iresult[450]= 720
  iresult[451]= 722
  iresult[452]= 712
  iresult[453]= 714
  iresult[454]= 716
  iresult[455]= 718
  iresult[456]= 720
  iresult[457]= 706
  iresult[458]= 708
  iresult[459]= 710
  iresult[460]= 712
  iresult[461]= 714
  iresult[462]= 716
  iresult[463]= 718
  iresult[464]= 704
  iresult[465]= 706
  iresult[466]= 708
  iresult[467]= 710
  iresult[468]= 712
  iresult[469]= 714
  iresult[470]= 716
  iresult[471]= 706
  iresult[472]= 708
  iresult[473]= 710
  iresult[474]= 712
  iresult[475]= 714
  iresult[476]= 704
  iresult[477]= 706
  iresult[478]= 708
  iresult[479]= 710
  iresult[480]= 712
  iresult[481]= 686
  iresult[482]= 688
  iresult[483]= 690
  iresult[484]= 692
  iresult[485]= 694
  iresult[486]= 696
  iresult[487]= 698
  iresult[488]= 700
  iresult[489]= 702
  iresult[490]= 704
  iresult[491]= 706
  iresult[492]= 708
  iresult[493]= 694
  iresult[494]= 696
  iresult[495]= 698
  iresult[496]= 9
  iresult[497]= 2
  iresult[498]= 0
  iresult[499]= 176
  iresult[500]= 0
  iresult[501]= 0
  iresult[502]= 0
  iresult[503]= 9
  iresult[504]= 91
  iresult[505]= 119
  iresult[506]= 39
  iresult[507]= 0
  iresult[508]= 1018929152

And the number of elements in this array was arround 749 elements (I don’t put the rest of the output, just the part when it starts going crazy)

You shouldn’t be using both .x or .y stuff unless you have a 2D array and even then, it’s easier to just write it all like a 1D array which is what I guess the blockDim.x and .y variables do.

Here’s a better example :

thrust::device_vector<int> arr;
arr.reserve(2000000); // 2 million

// initialize arr (make sure to use push_back to change size)

int tpb = 32; // threads per block
int blocks = arr.size() / tpb + (arr.size() % tpb ? 1 : 0);

kernel<<< blocks, tpb >>>(thrust::raw_pointer_cast(arr.data()), arr.size());

/* ... */

__global__
void kernel(int *arr, int n) {

     int i = blockIdx.x * blockDim.x + threadIdx.x;

     if (i >= n) // check if i is out of array bounds
         return;

     arr[i] = /* ... */

     return;
}

Hi.

I already got this working :) Thank you very much to everyone for your feedback :)