Loop trip count

I verified with NVVP that the loop trip count variable (the integer count in the example below) for all my parallel loops is never transferred to the device as a host to device data transfer. I first suspected that maybe the loop count value was somehow “hardcoded” into the kernel by the compiler, necessitating no further transfer to the device, but the lack of transfer detected in NVVP also occurs if I pull in the loop trip count value as a command-line parameter on runtime (where the compiler can’t “harcode” anything).

int count = 10;
#pragma acc parallel loop
for(int i=0; i<count; ++i){

How does the device “get” the loop trip count value then?


The loop becomes the schedule (the device blocks/threads) so the loop trip count isn’t needed in the kernel. Hence, it’s not hardcoded nor passed in.

  • Mat