newbie CUBLAS question


I just started playing with GPUs. For the most part I need to do linear operations and have coded up much of the stuff using CUBLAS. In one case I need to loop over an entire matrix in device memory and compute exp(x) for each element. Of course this can be done in parallel. What’s an easy way to perform this operation?? Is there a one liner or some sample code??


Just write your own small kernel code… that seems fairly easy for your computation.

But just calculating exp() in parallel mite not buy you much speed ( in-fact the CPU-GPU memory transfer mite make your code slower) unless you really have a huge matrix.

This is a small part of the computation that needs to be done again and again. The copying will only be done once so there will be a speedup. I have never written any “kernel code” so if you could point me to an example that would be great.

Alternatively if someone can point me to where cublasSscal is defined in the CUBLAS library – the source code – then I can just modify that for my purposes.