I use a struct:
to save values located on gridpoints. Every gridpoint consists of those six float values and is calculated by its own thread. All the structs are saved in global memory space, so every thread reads an “old” struct value from its gridpoint, calculates a new one and writes it back to global memory.
The new values are calculated in dummy variables by the kernel, like
float DummyDensO = …
float DummyDensN = …
and so on, and later written to global memory.
When the calculation of the six values in my kernel is finished I try to speed up the memory access by trying different ways of writing back to the global memory. Paradoxically all ways need nearly the same time. Perhaps someone can help me…
- First I tryed just:
GridValues[GridPointIndexGlobal].DensO = DummyDensO;
GridValues[GridPointIndexGlobal].DensN = DummyDensN;
GridValues[GridPointIndexGlobal].DensNO = DummyDensNO;
GridValues[GridPointIndexGlobal].DensO3 = DummyDensO3;
GridValues[GridPointIndexGlobal].DensN2A = DummyDensN2A;
GridValues[GridPointIndexGlobal].Temperature = DummyTemperature;
I thought, that costs much time, because I have 6 independent memory accesses.
- Then I tryed change the “Dummy”-variables from 6 floats to just the same structure, like:
struct DBDStruct DummyGridValue;
GridValues[GridPointIndexGlobal] = DummyGridValue;
Here I need only one memory access. It seems that both ways need nearly the same time. How is the second one realised internally? Does it just do the same like the first one?
Is there a fast way, like “copy this amount of bytes from here to there” which I can call from the device?
Thx for any help!!!