Hi David, thank you for your answer.

Ok, so I misunderstood you.

The launch index is a uint2 (X,Y pixel coord) and that one I store for each rendered curve primitive, then I calculate the difference between the current and the previous launch XY as signed integers, since difference between pixels are always integers; the flow vector then is stored as float2 of course;

Doing integer computation instead of converting 4 values to float and doing 2 float subtractions should be faster, shouldn’t it ?

And for screen pixels, which have as input already uint2 I don’t see any case where they would require any float-type.

I’m not calculating any hitpoints, I simply use the final screen pixel of a primitive, where the curve was rendered (ignoring race conditions during writing the XY for that primitive).

For the texcoord I use the curve parameter; I cannot simply calculate that for t-1, cause the curve input was build from a streamOut buffer, which itself was generated from a simulated animation; So its not the same curve at all, only its topology is ensured to remain identical.

I have not understood how the part what you described about the 3x3 matrix basis works, yes, please explain that.

Here an example output:

pixel [4054] coord: 480, 461 prev: 473, 469 flow:7.000000, -8.000000

pixel [4055] coord: 480, 459 prev: 476, 461 flow:4.000000, -2.000000

pixel [4058] coord: 488, 427 prev: 486, 427 flow:2.000000, 0.000000

pixel [5608] coord: 550, 419 prev: 550, 419 flow:0.000000, 0.000000

pixel [5609] coord: 557, 405 prev: 557, 405 flow:0.000000, 0.000000

pixel [5830] coord: 641, 104 prev: 636, 103 flow:5.000000, 1.000000

From the temporal denoised visual output the above described pixel-difference flow vectors seem to work, even when they are somehow applied to all pixels of a curve primitive (if its covering more than one pixel).

Visually the output seems not to have artefacts on the test colors I used for now.