Hello, my name is Alan and this is my first post, I’m currently working at my university assisting a professor with his research. He gave the task to program a simple function for its program, but it has to be optimized for parallel computing. This is my first week of coding under this architecture, I already read the programming guide and best practices guide and took a look at GPU Gems 3, but I’m still unable to make this function work.
It is a simple one, it has the form of:
e^(-gamma*(x-y)(x-y))
Where gamma is a constant, and multiplies the dot product of (x-y) and (x-y)
My professor told me that I’m not supposed to use any of the math functions, in other words I need to recode everything, so I made use of taylor series to create the e function, and I’m also considering the error for truncating the series. I also programmed the factorial function, but i think that wasn’t really necessary and I’m thinking that I can get rid of it, and I spent almost a day to program the dot product function when I found the cublasSdot function that made exactly what I wanted, but now that I’m trying to put everything together, it doesn’t even compile. I’ve been making changes to the code all day long and I can’t get it, I now that my current code is more than wrong, but I’m just stocked and I hope some of you could show me the right direction. I’m not asking to do the job for me, just show what am I doing wrong.
I’ve read most of the whitepapers in the SDK section, and I’ve runned all the examples, but when I tried to implement a concept shown there i simply can’t.
This is my code:
[codebox]
#include <stdio.h>
#include “cublas.h”
long fact(int n)
{
if(n==1)
return 1;
else
return n*fact(n-1);
}
float exp(long* x)
{
return 1+x+(xx)/fact(2)+(xxx)/fact(3)+(xxxx)/fact(4);
}
float gaussian(float* gamma, float* X, float* Y, float* N)
{
return exp(-gamma*cublasSdot(N,X-Y,1,X-Y,1));
}
int main()
{
cublasStatus status;
// Kernel invocation with N threads
int N=10;
float dp;
size_t size = N*sizeof(float);
float* h_A = (float*)malloc(size);
float* h_B = (float*)malloc(size);
float *d_A;
cudaMalloc((void**)&d_A,size);
float *d_B;
cudaMalloc((void**)&d_B,size);
status = cublasInit();
if (status != CUBLAS_STATUS_SUCCESS) {
fprintf (stderr, "CUBLAS initialization error\n");
return EXIT_FAILURE;
}
//Invoke kernel
dp = cublasSdot(N,d_A,1,d_B,1);
printf(“%f”,dp);
cudaMemcpy(d_A,h_A,size,cudaMemcpyHostToDevice);
cudaMemcpy(d_B,h_B,size,cudaMemcpyHostToDevice);
//Free device memory
cudaFree(d_A);
cudaFree(d_B);
}
[/codebox]