I am new to standard Fortran “do concurrent” GPU coding.
Recently I had my model successfully running on A800 GPU by using “do concurrent” and “-stdpar=gpu”. It is 15 times faster than serial CPU code.
Now I want to further boost the running speed. So I am now trying “-stdpar=gpu -acc=gpu -gpu=nomanaged”, while adding some “!$ACC ENTER DATA COPYIN()”, “!$ACC UPDATE HOST()”, “!$ACC UPDATE DEVICE()” to my code. But the model result is not correct yet.
I want to figure out that whether “-gpu=nomanaged” makes all “do concurrent” loops free of data transfer, if allocatable variables used are all present (already COPYIN)? Or there are still some data transfers while executing “do concurrent”?
Now I get the answer by myself. It is. The model is now correct and 21 times faster. Previous code has some blanks before “!$ACC UPDATE DEVICE()” which makes the directives ignored. I wish the compiler could kindly remind me there is a risk that !$ACC UPDATE DEVICE() be ignored.
You’re likely using fixed format where comment characters must be in the first column. It’s not something that the compiler can warn you about since it’s valid to include spaces there.
Though what you can do is add the flag “-Minfo=accel”. This enables the compiler feedback messages which will show when an OpenACC directive is applied. Hence if one’s missing, then you’ll find the error.