Find the largest value among all threads result In CPU or GPU?

humorstar · November 23, 2007, 12:16am

I need to find the largest value among all the threads results. Suppose each thread comes up with a float value as a result, what is the best way to find out which one is the largest? Shoudl I transfer all data to CPU and let CPU to handle this? I noticed it is very time consuming to copy data from device memory back to CPU.

Thank you,

humorstar · November 23, 2007, 2:30am

A follow-up question is: if I do find this largest value among all threads, and want to return this value to CPU. Do I still have to allocate one float space in device memory and copy it from device memory to host memory? Is there a faster way?

jordyvaneijk · November 23, 2007, 8:52am

I think it depends on what you want to do with it? If you never want to use it on the CPU you don’t have to transfer it back to the host. And for your first question. I think it is very hard to find the largest value inside the kernel because than the threads needs to depend on each other. And how I see it, that is something you don’t want to have, right?

Jordy

asadafag · November 23, 2007, 10:53am

Just take a look at the reduction sample in SDK, please.

jordyvaneijk · November 23, 2007, 12:17pm

But a reduction is not the largest value of the array but an Accumulated value. Or am I wrong?

MisterAnderson42 · November 23, 2007, 2:51pm

Just change the reduction example to do max() instead of +, think about what the example is doing and you will see that it works.

paulius · November 25, 2007, 12:18am

Reduction approach works for any associative operator. Also, check out the CUDA Data Parallel Primitives ([url=“http://forums.nvidia.com/index.php?showtopic=50392”]The Official NVIDIA Forums | NVIDIA), there may already be a funtion that you need.

Paulius

humorstar · November 25, 2007, 4:27pm

Thanks to all. Here is what I did:

p1 = THREADS; // keep the middle index

	while (p1 > 1)

	{

  p1 = rintf(p1 * 0.5f);  // divided by 2, rounded to the nearest integer

  if( (threadIdx.x < p1) && (s_solutions[threadIdx.x] < s_solutions[threadIdx.x+p1]) )

  {

  	// push the bigger element to the first half

  	s_solutions[threadIdx.x] = s_solutions[threadIdx.x+p1];  

  }

	}

As a result, s_solutions[0] will be the largest value.

Mark_Harris · November 29, 2007, 10:09pm

Unfortunately CUDPP currently only has max-scan, not max-reduce. In fact it doesn’t have any reductions yet – it’s on our todo list!

Mark