Hello everyone. Help with the function of the FNV . I’m tired to fight it .
In its simplest form fnv function looks like this:
fnv4( x, y)
return x * 0x01000193 ^ y;
I am writing on PTX:
mov.u32 round,0x00; $LLBfnv1: ld.global.u64 %rM,[mixzero]; ld.global.u64 %rA,[mixzero+128]; mul.hi.u64 %rt0,%rtM,0x01000193; shl.b64 %rt1,%rM, 32; mul.hi.u64 %rt1,%rt1,0x01000193; shl.b64 %rt0,%rt0, 32; xor.b64 %rt0,%rt0,%rt1; xor.b64 %rM,%rt0,%rA; add.u32 round,round,1; setp.lt.u32 p,round,64; @p bra.uni $LLBfnv1;
I need a way to process 128 bytes in 64 rounds. With that, if i calculate the 16 threads in parallel by 2 bytes ,that result after each round to keep. Because %rM change depending on the round results.
If stored in a shared memory. Then it turns out that I can simultaneously run only 49152/128 = 384 threads. It is very small .
At the moment, I got GTX660 6 800 000 execution functions. If parallels are not just the function itself . A 128 bytes calculate sequentially in each thread.
Then we can get rid of conservation as the thread and so will see the results of 128 bytes .
To give you an example to understand purebasic why it is necessary to see the results after each round:
For i = 0 To 63 p=fnv(i ! ValueL(*s), ValueL(*mix+i % w) ) % (n /mixhashes) * mixhashes fnv64BI(*mix,*Fullarray+p*#HASH_BYTES,*mix) fnv64BI(*mix+#HASH_BYTES,*Fullarray+(p+1)*#HASH_BYTES,*mix+#HASH_BYTES) Next i