Logarithm in cuda kernel? Some help

Hi everybody,
The NOOB is back,
I’ve got some problems with one of my kernel and I think the problem is:

global void LogCompression_kernel ( float* mode_d, float* log_d)
{
int idx = blockIdx.x*blockDim.x + threadIdx.x;//En long juste en cas ou
int i;
if (idx<nmlne)
{
for(i=0;i<nmpts;i++)
{

		log_d[idx*nmpts + i] = 20*log(mode_d[idx*nmpts + i]);
	
	}
}

}

log_d have always the same value so I think my log function is not good.
I’ve already included <math.h>

A little help please?

Out of curiosity, is there any reason why you use that inner loop and not spawn nmpts times more threads and have a flat, loopless kernel?

Make sure you use logf if you’re operating on floats or the compiler might think you want to perform double precision arithmetic there (and will implicitly cast your floats to doubles and then back to floats).

<math.h> has nothing to do with it, kernels don’t use host functions.

Do you check for errors after kernel launch? Are you sure the kernel launches at all?

First, I believe your loop is wrong if nmlne is the number of elements. Be aware that due to index calculation you are accessing (nmpts * nmlne - 1) elements.

Second, if log_d is global variable, you should keep your accesses coalesced, otherwise the program is very ineffective. If you can not get rid of loops using more threads, try a different index calculation, like i*nmpts + idx, so that during one loop all threads can access neighboring elements.

Welcome… The NOOB’s question is in FRONT.

btw, Did you graduate from SSN college? I may know you. Anto is a familiar name.

Sorry anto it’s not a name it’s a nickname :shifty: