I am new to CUDA and I have something strange happening.
If I execute under EMULATION mode, the code works fine, but if I execute in DEBUG mode (on the GPU) it doesn’t work, or more accurately doesn’t give correct results. When executing in debug mode (on gpu) the values I get back from the kernel are 0.
I modified the code of C++ integration example in the SDK because there was an existing C++ program to parallelize.
I need to give about 30 variables to the kernel, and get 4 back. Here is how I’m doing this…
cent_,resolution,sin_beta,cos_beta, a_elipse, b_elipse, x_elipse, y_elipse, h_grid, grid_resolution, scaleC,scaleR, d_maxh, d_maxv, d_isFresnel, d_isLOS);
The values I need to get are d_maxh, d_maxv, d_isFresnel and d_isLOS. The code for their initialization is:
size_t sizeMaxv = numsizeof(double);
double h_maxv = (double*)malloc(sizeMaxv);
I don’t copy start values because they are calculated on the GPU, but I get the results with:
cudaMemcpy(h_maxv, d_maxv, sizeMaxv, cudaMemcpyDeviceToHost);
Does anyone have an idea?