I’m trying to convert a C program I have into CUDA. In the original C program I took the square-root of a scalar (wtw) by first typing #include “math.h” and then called the sqrt function like this: *wtw = sqrt(*wtw);.

Now that I’m converting the program into CUDA code I want to do the exact same thing, but I cant get it to work. I’ve been reading the CUDA math API reference manual but I don’t understand it and there are no clear examples. This is what my code looks like:

```
…
/* Create a wtw variable on the device */
double *wtw;
cudaMalloc(&wtw, (p*p) * sizeof(double));
…
/* wtw becomes the output from a cublasDgemv call */
cublasDgemv(handle, CUBLAS_OP_T, n, p, &alpha, w, n, w, incx, &beta, wtw, incx);
/* Try to calculate the square-root of wtw (but fail) */
wtw = sqrt(wtw);
```

When I run it I get the error: “no instance of overloaded function “sqrt” matches the argument list”.

The Mathematical function documentation simply says: ‘CUDA mathematical functions are always available in device code’, and regarding the use of sqrt: ‘**device** double sqrt (double x)’. What does that even mean?

Should I include something at the start of my program to make it work? What am I doing wrong? If someone could provide a simple example on how to do this I would greatly appreciate it.

Thanks.