Hi all,
I am a newcomer in the CUDA programming community: I am currently testing with my poor Geforce GT 120 if CUDA can be of any help to speed up my math calculations. I am currently writing my first own program, but I observe a mysterious behavior for a quite simple operation…
Few technical details which can be of some importance:
I’m using mac Osx 10.6.4, two Geforce GT 120 cards.
I am coding in C++ in “rotMat3.cu” file, then compiling to cpp:
nvcc -I"/Applications/MATLAB_R2009b.app/extern/include" --machine 32 --gpu-architecture sm_11 --cuda “rotMat3.cu” --output-file “rotMat3.cpp”
After this I use mex compiler (~ g++) to get mex files which can be used by matlab:
mex -I/usr/local/cuda/include -L/usr/local/cuda/lib -lcudart rotMat3.cpp
I ask for help since debugging through mexfiles and cuda looks quite tricky…
I am doing some pretty simple vector calculations (int the spirit of the vectorAdd standard example), with everything stored in global memory for now.
Here is the few lines which are mysterious in my kernel function:
[codebox]//vector initialization
int i;
for (i=0; i<(order+1)(order+1); i++){(outputMat+i) = 0;} //outputMat is float*
//…
//fill vector
iterOutput = outputMat + n1 + m1*(order+1);
float polynom = powf(cosAng, k11+k22)*powf(sinAng, k12+k21)*powf(-1, k12);
float factorials = (normalizerfactK1factK2)/nFact;
(*iterOutput) = (iterOutput)+ polynomfactorials;//problem here
[/codebox]
In practice this code does not fill my vector outputMat* at all: it does nothing at all. I suspect that it crashes silently…, that the threads are stopped. But I get no message.
The strange thing is that if I replace the last line by:
codebox = (*iterOutput)+ polynom;//works[/codebox]
or
codebox = (*iterOutput)+ factorials;//works[/codebox]
output is correctly filled with polynom floats and factorials floats, respectively. I thus suspect that the product polynom*factorials is the source of the crash. But why?
Maybe it’s FMAD issue, but I barely understanding anything at it. I tried to replace the messy product by the _fmul intrisic function, witout any success.
Does anyone has any idea how to fix my issue, and where it comes from?
Thanks,
nicolas