Acceleration of nested loops

Hi all!

I have the following code (part):


int length_new = _v.size();
int* _index_test_c = new intlength_new;

#pragma acc kernels loop pcopyin(acol[0:acol_size], aele[0:aele_size], u[0:u_size], v[0:v_size]) pcopy( _index_test_c[0:length_new])
for(int ii=0; ii<size_vector; ii+=_block_size){
const T idx = ii*_max_length_local;
const int kk_max= _block_size+ii<size_vector ? _block_size+ii : size_vector;

for(int kk=ii; kk<kk_max; ++kk){
#pragma acc loop seq
for(int jj=idx+kk-ii; jj < idx+_max_length_local*_block_size; jj+=_block_size){
const T q = acol[jj];
const S a = aele[jj];
_index_test_c[kk] = q;
v[kk] += a * u[q];
}
}
}


which fails (wrong calculation result)! If I change (second line):


-> int* _index_test_c = new intlength_new;
to
-> double* _index_test_c = new doublelength_new;
or alternative to
-> long long int* _index_test_c = new long long intlength_new;


the code works!
Therefore, I think I can conclude, that the compiler is not able to calculate an integer operation and a double operation in one nested loop.

My problem: This loop is just a part of a bigger calculation, and I can not exclude that the compiler has to calculate (in some loops) some integer operations on the device (to get the correct index). And if the compiler do so, I get the wrong result.
Is there ANY ALTERNATIVE than trying to change all integer values to “long long int”?

Best,
Stefan

Hi Stefan,

Therefore, I think I can conclude, that the compiler is not able to calculate an integer operation and a double operation in one nested loop.

While I don’t know what’s wrong, this is probably the wrong conclusion. Double and integer operations can be used together in a nested loop.

Can you please post or send a reproducing example PGI Customer Service (trs@pgroup.com) so we can investigate?

Thanks,
Mat

Hi Mat,

thanks for replying! I send (in the next minutes) a “long” mail to PGI Customer Service (trs@pgroup.com).

Best,
Stefan

Thanks Stefan. The compiler feedback messages you sent may hold some clues.

Given the compiler feedback, it appear that you may not be using the independent clause for the other loops and is why you’re seeing the messages about dependencies. And in loop you post above, it does appear to be the case.

Note that you might consider using the “parallel� directive instead of “kernels�. The main difference being that with “parallel� loops decorated with the “loop� directive are assumed to be independent, while with “kernels� the compiler must prove independence.

One thought on why you’re seeing a difference using “int� versus “long long int� and “double� is due to aliasing rules. If you have two “int *� arrays or even in some cases an “int *� and an “int�, the compiler must assume that the pointers point to the same memory and therefore cause a dependency. The compiler is safe to assume independence when the pointers are of different types. One thing that helps here is to add the C99 “restrict� keyword to the declaration of the arrays where “restrict� is an assertion to the compiler that pointers do not overlap.

Hope this helps,
Mat

Hi Mat!

I wrote:
int* __restrict _index_test_c = new intlength_new;

and it works NOW :-D

Thank you for your help! I try the same for the rest of my code, that might be useful for the rest of the code too!

Best,
Stefan

I try the same for the rest of my code, that might be useful for the rest of the code too!

It should and not just for OpenACC. It will help when auto-vectorizing CPU code as well.

  • Mat