I’m a CUDA newbie, and this is my first post on the forum, so good to meet you all!
My reason for looking at GPU computation is to calculate (and sum over) a large (about 100000x1000) matrix. My problem concerns time series data where each of the elements of this matrix represents an integral over time. Fortunately I can discretise this interval to make the maths easy. Let’s call this matrix B. I have a further two vectors, I and N, which correspond to event times for the indices of the matrix B. Then:
B[i,j] = p0 * I[j] + p1 * ( min( N[i] , I[i] ) - min( I[i] , I[j] ) )
It would be great to use the GPU to calculate this sum of products. However, I note that conditional statements are unlikely to make this approach effective, and here I have two. However, I could write the following:
min( N[i] , I[j] ) - min( I[i] , I[j] ) = ( N[i] -| I[i] ) - ( N[i] -| I[j] )
where the -| operator returns the difference between the operands if the difference is positive, and 0 otherwise - ie the result floors out at 0. My question is: does such an operator exist? If so, then it would be a great way around having to introduce the branching construct and destroy the thread parallelisation.
My solution so far is to define (using horribly loose notation):
a -| b = fabs( ( a-b )* !signbit( a-b ) )
This seems very complicated for what should be a simple operation, and I’m not sure whether this is using branching under the hood. Plus, it’s not robust as signbit() is defined in the standard as returning non-zero if the sign is negative. Any advances on this??