Need help in CUDA programming

How could I optimize this code in CUDA?

for(int k = 0; k < packet; k++){
if(dfa->flowsP[k] == hashFlow){
store = 0;

Thanks for your reply

const int tid = threadIdx.x + blockIdx.x * blockDim.x;

if (dfa->flowsP[tid] == hashFlow)
   store = 0;

Granted, I have no idea what “store” is so I don’t know if you need atomics or not. If “store” is the same across all threads then you probably won’t get much benefit from parallelizing this algorithm as atomic writes assure that all threads finish writing before another starts which would effectively serialize the algorithm.

This algorithm is to check if I can capture all flows of a trace, if not capture some, the “store” will return 0.

You helped a lot, I’m still new to CUDA.