I created a test code in which I declare an FX array (shared memory) and append values in it in a global kernel. I stored it in a global variable TEST. Then the code goes into a device kernel with the specification INTENT(IN) :: FX, specifically not to corrupt the FX. Yet in the second turn over the loop, the data gets corrupted.
Actually I expect to get something like
The first iteration
1.000000000000000 2.000000000000000 -1.000000000000000
2.000000000000000 1.000000000000000 2.000000000000000
-1.000000000000000 2.000000000000000 1.000000000000000
The second iteration
1.000000000000000 2.000000000000000 -1.000000000000000
2.000000000000000 1.000000000000000 2.000000000000000
-1.000000000000000 2.000000000000000 1.000000000000000;
it gives
The first iteration
1.000000000000000 2.000000000000000 -1.000000000000000
2.000000000000000 1.000000000000000 2.000000000000000
-1.000000000000000 2.000000000000000 1.000000000000000
The second iteration
-3.000000000000000 14.00000000000000 -50.00000000000000
-4.000000000000000 2.000000000000000 100.0000000000000
5.000000000000000 10.00000000000000 50.00000000000000.
I could not figure out why it is corrupted. Can you help? I sent the code below.
What I think is going on is that “ADJF” is pointing to the same memory as “FX”.
With dynamic shared memory, you’re basically creating one memory block with each shared array being offsets into this block. Hence when you use “ADJF” by itself in the device routine, it’s offset is the same as “FX”.
Adding the other shared arrays to D_INV seems to work around the issue:
Also, shouldn’t “DETF” be shared as well? It’s only getting set by one thread but used by all of them. If not, the you should move it from the if block so all threads set it.
Once I have done as you said, it solves the problem that I aforementioned. Yet since we do not put few of the arguments in the definition of “D_INV”, I believe it does not carry the variables neither from “global kernel scope” to “device kernel scope” nor vice versa. To show it, I used “DETF” in the global kernel scope while a value is appended in the device kernel and it gave 0 even though DETF is not zero.
For the automatics, these are pointing into the dynamic shared memory block. So while variables themselves are different between the global and device routines, the memory that they point to is the same so effectively “carry” the results across the calls. You just need to keep the same order so they point to same place in the dynamic shared memory in both routines.
The problem here is with “DETF”. This is a scalar so doesn’t point to the dynamic shared memory. Instead you have two different shared DETFs, one declared in each routine. In the first version, you only had DETF declared in the device routine. I only suggested making it shared because you have it’s assignment guarded in the if block which only one thread sets. Hence when local, it would be uninitialized for the other threads.
For this code, the minimal change is to keep DETF shared in the global routine, but pass it as an argument to D_INV, without “SHARED” on it’s declaration.