Hello,
I am new to CUDA and I have something strange happening.
If I execute under EMULATION mode, the code works fine, but if I execute in DEBUG mode (on the GPU) it doesn’t work, or more accurately doesn’t give correct results. When executing in debug mode (on gpu) the values I get back from the kernel are 0.
I modified the code of C++ integration example in the SDK because there was an existing C++ program to parallelize.
I need to give about 30 variables to the kernel, and get 4 back. Here is how I’m doing this…
C++ code:
runTest(Xr,Xt,Yr,Yt,Ys,Ye,Zs,a_xy,b_xy,a_xz,b_xz,Xs,d,lambda
,&max_h,&max_v,cos_alfa,&isFresnel,&isLOS,percent_,resolution,sin_beta,cos_beta,a_elipse,b_e
lipse,x_elipse,y_elipse,topo->grid,topo->maxX,topo->maxY,topo->grid_resolution,topo->scaleC,topo->scaleR);
cppIntegration.cu:
kernel<<<1, threadsPerBlock>>>(Xr,Xt,Yr,Yt,Zs,a_xy,b_xy,a_xz,b_xz,d,lambda,cos_alfa,per
cent_,resolution,sin_beta,cos_beta, a_elipse, b_elipse, x_elipse, y_elipse, h_grid, grid_resolution, scaleC,scaleR, d_maxh, d_maxv, d_isFresnel, d_isLOS);
The values I need to get are d_maxh, d_maxv, d_isFresnel and d_isLOS. The code for their initialization is:
double* d_maxv;
size_t sizeMaxv = numsizeof(double);
cudaMalloc((void**)&d_maxv, sizeMaxv);
double h_maxv = (double*)malloc(sizeMaxv);
I don’t copy start values because they are calculated on the GPU, but I get the results with:
cudaMemcpy(h_maxv, d_maxv, sizeMaxv, cudaMemcpyDeviceToHost);
Does anyone have an idea?
Thanks,
Vojdan.