CUDA and fixed-point comparaison on big array Is CUDA suitable for fixed-point comparaison?

Hello,

I’m working on an optimization software which runs a simulation thousands of times to find the best parameters. This simulation is very fast (around 20ms on my computer) but perfoms comparaisons on a lot of data (around 15 millions of integer).

I have multithreaded the optimization process and it’s very easy to have as many thread as I want. I would like to know if using CUDA will improve the performance of my software? I’m using fixed point operations (comparaisons and additions mostly) on a big array.

Thanks,
Jean-Paul

Sounds like this is a good fit for CUDA. As on modern CPUs, integer operations and floating point operations have the same throughput in CUDA.

What are the comparisons used for? One thing where CUDA does not work well is if the individual threads all take different code paths. However, [font=“Courier New”]min()[/font], [font=“Courier New”]max()[/font], and the triadic … [font=“Courier New”]?[/font] … [font=“Courier New”]:[/font] …] operator all compile to single instructions. So if the code can be formulated using these, it should be fine.

Depends on whether the algorithm has high arithmetic intensity, as the overall speedup ( which includes moving the data to GPU memory ) might not be huge for GPU kernels/ functions with low arithmetic intensity. With what you have described, it looks like this is the case.

I have 5 arrays of 3 millions of integer each. I have a loop on the first array where I compare this value to others (like if (a>b && c>d)) (around 5 comparaison each time so it means also a lot of memory access) and depending on this result I’m doing few additions (but maybe 1 every 200 times). So it’s not arithmetic intensive, just a lot of comparaisons.

All the threads are running the same code but with diffents parameters (like differents b, differents d).

I don’t know if GPUs can improve the performance of my code or if it’s better to just use CPUs.

Thanks.

Any idea of the performance I can get on a GPU? Or for this application is it better to use CPUs?

Thanks.

Show us your main loop. It is impossible to predict performance otherwise.

Hello,

The code looks like that: (with INDEX_END around 3 millions. I’m running this code thousands of time with several parameters.

for (j=0 ; j<INDEX_END ; j++)

    {

        if (condition1)

        {

            if (tmp1-tmp2>=Data[j])

            {

                tmp = 0;

            }

        }

if (condition2)

        {

            tmp = 1;

            nb++;

        }

if (condition3)

        {

            if (tmp1-tmp>=Result[j])

            {

                tmp2 = Data[j];

            }

        }

if (condition4)

        {

            tmp1 = 1;

            nb++;

        }

}

Thanks.

It appears this code, as is, is entirely memory bandwidth bound both on the GPU and the CPU. This limit can easily be circumvented though by checking several sets of parmeters in parallel while performing a single scan through the data. Both CPU and GPU should profit from this change, but on the GPU this would be absolutely essential to exploit the parallelism. It should also lead to a good speedup vs. the CPU close to the arithmetic throughput ratio.