Problem with double precision matrix float works, double doesn't

mahnaz · April 22, 2009, 1:25am

[font=“Courier New”][font=“Courier New”]Hello. So far I haven’t received any answer for my questions here, I hope to have better luck this time.

I am trying to learn matrix operations in CUDA (with Tesla C1060). I have an NN symmetric matrix organized in coumn major. I want to increment the upper half elements by a constant. My simple code has a total of NN threads, so only N(N+1)/2 threads will be active.

I run the kernel and check the matrix after 1000 iterations. With the “CorrectKernel” the results are as expected whether the TYPE is double or float. With “WrongKernel”, if the TYPE is float the results are as expected (the upper half is updated to 2000 and the rest of the matrix is 0), but if the TYPE is double the upper half is updated correctly but the rest of the matrix has random values. It seems like with “double” either the memory gets messed up (alignment problems?) or the idle threads don’t stay idle! What am I doing wrong? I appreciate any help.

[font=“Courier New”][font=“Courier New”]

[codebox]#define TYPE double

#define N 256

TYPE *matrix1, *matrix2;

TYPE alpha = 2;

[/font][/font][font=“Courier”][font=“Courier”]CUDA_SAFE_CALL( cudaMalloc( (void**) &matrix1, NNsizeof(TYPE)));

CUDA_SAFE_CALL( cudaMalloc( (void**) &matrix2, NNsizeof(TYPE)));

for(int i=0; i<1000; i++)

CorrectKernel <<<N,N>>>(matrix1, alpha );

for(int i=0; i<1000; i++)

WrongKernel <<<N,N>>>(matrix2, alpha );

global void CorrectKernel(TYPE *base, const TYPE alpha)

{

int bx= blockIdx.x* N + threadIdx.x;

if(threadIdx.x <= blockIdx.x){

    base[ bx] += alpha;

else

    base[bx] = 0; //force the idle threads to do something

__syncthreads();

}

global void WrongKernel(TYPE *base, const TYPE alpha)

{

int bx= blockIdx.x* N + threadIdx.x;

if(threadIdx.x <= blockIdx.x){

   base[ bx] += alpha;     //only active threads increment, idle threads stay idle

__syncthreads();

}[/codebox]

[/font][/font]

[font=“Courier New”][font=“Courier New”]

[/font][/font][/font][/font]

tmurray · April 22, 2009, 2:15am

Compile with -arch sm_13. Meanwhile, I will punch the compiler team for allowing people to use doubles without returning an error when you’re not compiling with sm_13…

seibert · April 22, 2009, 3:10am

Heheheh… Isn’t the automatic demotion of double precision to single precision a documented “feature” of CUDA? That was the line all the way back to CUDA 0.8, though I’m not sure if anyone actually relies on this behavior. I’m not even sure if this automatic-demotion works in practice since it seems like anyone who forgets -arch sm_13 has their kernel explode all over rather than quietly lose precision.

Making double usage without sm_13 a compiler errors gets a +1 vote from me if such a change is possible now.

mahnaz · April 22, 2009, 3:38am

Thanks for your response. I do complie with “-arch sm_13 -code sm_13” (though I’m not sure if I should have -code). If the code doesn’t have an obvious problem, I suspect that I am missing some initialization. What is the proper way of initializing the GPU in each run?

tmurray · April 22, 2009, 4:14am

It’s not that bad whenever you explicitly define a double within a kernel on an architecture that doesn’t support it (constants, basically). It’s wrong whenever you try to pass a double from the host to the device without compiling with arch sm_13. I think the most useful behavior would be a warning in the former (somebody hit that earlier today, for example) and an error for the latter.

mahnaz: Are you checking errors both on the launch (cudaGetLastError immediately after trying to launch the DP kernel) and the kernel execution (check the return status of cudaThreadSynchronize)?

Pimbolie1979 · April 22, 2009, 7:01am

1. Is my modification OK?

“$(CUDA_BIN_PATH)\nvcc.exe” -arch sm_13 -ccbin “$(VCInstallDir)bin” -c -D_DEBUG -DWIN32 -D_CONSOLE -D_MBCS -Xcompiler /EHsc,/W3,/nologo,/Wp64,/Od,/Zi,/RTC1,/MTd -I"$(CUDA_INC_PATH)" -I./ -I…/…/common/inc -o $(ConfigurationName)\simpleCUFFT.obj simpleCUFFT.cu

2. Is this compiler switch better for my 285GTX

-gencode arch=compute_13,code=sm_13

mahnaz · April 22, 2009, 2:00pm

[font=“Verdana”]No, I haven’t been checking errors. I will. The problem is that most of the times the code runs fine. It is an intermittent problem and I can’t reproduce it. When I reboot the system things are correct, until something goes wrong and after that it doesn’t recover and the errors show up with each run. That’s why I think I must be overstepping on something and need to do a better initializing of the GPU. I know I don’t have memory leak, and am very careful about freeing allocated memory. Also the system never crashes, only the content of memory gets messed up. It always work with emurelase and I can’t see any race condition in that simple kernel.

Thanks for your help.

[/font]

mahnaz · April 22, 2009, 2:08pm

Sorry for the double post. I don’t know what I am doing wrong in quoting the code that it doesn’t come out right (the preview window is different from what shows up when I post). I just noticed that in my original post part of the code is cut off. I will add the Correct and Incorrect kernels here, and appreciate some input as to why when idle threads update the memory the result is correct but when they don’t do anything it gets randome values. Thanks again.

[b]
global void CorrectKernel(TYPE *base, const TYPE alpha)
{

int bx= blockIdx.x* N + threadIdx.x;

if(threadIdx.x <= blockIdx.x)
base[ bx] += alpha;
else
base[bx] = 0; //force the idle threads to do something

__syncthreads();
}

global void WrongKernel(TYPE *base, const TYPE alpha)
{

int bx= blockIdx.x* N + threadIdx.x;

if(threadIdx.x <= blockIdx.x)
base[ bx] += alpha; //just the active threads update the array

__syncthreads();
}
[/b]

Topic		Replies	Views
Double precision numbers, emulation, and compute capability < 1.3 CUDA Programming and Performance	5	1720	August 11, 2009
Kernel works in single precision but not in double CUDA Programming and Performance	7	1596	July 28, 2009
Double precision in CUDA 2.3 CUDA Programming and Performance	5	38171	March 5, 2010
Issues with double precision support on GT200 CUDA Programming and Performance	7	2717	July 7, 2008
Matrix Multiplication Buggy CUDA Programming and Performance	13	5233	May 5, 2010
problem with double precision unpredictable results Different run give differents errors or no error CUDA Programming and Performance	12	2783	September 10, 2010
constant memory usage problem unexpected behavior using constant memory CUDA Programming and Performance	15	9703	April 1, 2009
Declaring double precision constants in kernel? CUDA Programming and Performance	7	7531	July 18, 2009
Wrong output in double precision CUDA Programming and Performance	20	5909	May 20, 2011
This code doesn't work maybe too much threads assigned? CUDA Programming and Performance	8	1085	February 2, 2014

Problem with double precision matrix float works, double doesn't

Related topics