How can I speed-up this kernel ?

Well, I’m still working on my own PRNG… And it seems that, the seeding function, even if it’s very simple, runs very slowly compared to the actually random number generation kernel…

What could I modify in this piece of code ?

[codebox]global void pasqualoniInit(unsigned int seed, void* data, unsigned int dataSize)


    //__shared__ float sh_arr[ PASQUALONI_INIT_BLOCK_SIZE ];

const unsigned int tid = threadIdx.x;

    const unsigned int bid = blockIdx.x;

    const unsigned int bdim = blockDim.x;

const unsigned int ind = bid * bdim + tid;

if( ind > dataSize - 1 )


unsigned int * arr = (unsigned int*)data;

arr[ ind ] = (float)ind * __sinf((float)((tidseed)(ind*seed)%dataSize));



Thanks in advance.