worked fine for "int" "float" but NOT "double"

winterfire · March 9, 2009, 4:10pm

Hi,

I have written a simple program with the kernel function below, and had made sure initialization steps are correct. The strange thing is that program produces correct results for “int” and “float”, but Not for “double”… Please help… thanks…

global void mytest(int *test1, double *test2){
int idx= threadIdx.x;

test1[idx]=idx;
test2[idx]= idx;
}

output:
test1 [1]=1; test1[2]=2;…
test2[1]=0; test2[2]=0;…

note: test1 and test2 are initialized to 0.

Jamie_K · March 9, 2009, 4:29pm

From the programming guide, page 24:

“Some ptx instructions are only supported on devices of higher compute capabilities. For example, atomic instructions on global memory are only supported on devices of compute capability 1.1 and above; double-precision instructions are only supported on devices of compute capability 1.3 and above.”

YDD · March 9, 2009, 4:47pm

To expand on the previous post, you need to either set [font=“Courier New”]-arch sm_13[/font] in your call to [font=“Courier New”]nvcc[/font] or use the [font=“Courier New”]CUFILES_sm_1[/font]3 variable before including [font=“Courier New”]common.mk[/font] in your makefile.

winterfire · March 9, 2009, 4:49pm

Thank you Jamie for your quick reply!! External Image

I used “nvcc -arch sm_13” as you suggested from the page24, but still “double *test2” shows no sign of being executed in the kernel function. More specifically, “double *test2” always shows its initial value “0”… any more suggestions? Thanks…

YDD · March 9, 2009, 5:23pm

Can you post the complete source file?

winterfire · March 9, 2009, 5:55pm

code attached. Thx…

Please change “mytest.txt” to “mytest.cu” before execution. I wasn’t allowed to upload files with".cu" extensions.
mytest.txt (1.24 KB)

YDD · March 9, 2009, 6:13pm

I get the following output:

[codebox][ydd@localhost tmp]$ make

mkdir -p /usr/local/NVIDIA_CUDA_SDK/common//…/lib

mkdir -p obj/release

mkdir -p ./bin//release

/usr/local/cuda/bin/nvcc -o obj/release/test.cu_sm_13_o -c test.cu --compiler-options -fno-strict-aliasing -I. -I/usr/local/cuda/include -I/usr/local/NVIDIA_CUDA_SDK/common//…/common/inc -DUNIX -O3 -arch sm_13

g++ -fPIC -o ./bin//release/HelloCUDA obj/release/test.cu_sm_13_o -L/usr/local/cuda/lib -L/usr/local/NVIDIA_CUDA_SDK/common//…/lib -L/usr/local/NVIDIA_CUDA_SDK/common//…/common/lib/linux -lcudart -L/usr/local/cuda/lib -L/usr/local/NVIDIA_CUDA_SDK/common//…/lib -L/usr/local/NVIDIA_CUDA_SDK/common//…/common/lib/linux -lcutil

[ydd@localhost tmp]$ ./bin/release/HelloCUDA

vnew[0] is 0; vnew2[0] is 0;vnew[0] is 0

vnew[1] is 1; vnew2[1] is 1;vnew[1] is 1

vnew[2] is 2; vnew2[2] is 2;vnew[2] is 2

vnew[3] is 3; vnew2[3] is 3;vnew[3] is 3

vnew[4] is 4; vnew2[4] is 4;vnew[4] is 4

vnew[5] is 5; vnew2[5] is 5;vnew[5] is 5

vnew[6] is 6; vnew2[6] is 6;vnew[6] is 6

vnew[7] is 7; vnew2[7] is 7;vnew[7] is 7

vnew[8] is 8; vnew2[8] is 8;vnew[8] is 8

vnew[9] is 9; vnew2[9] is 9;vnew[9] is 9

vnew[10] is 10; vnew2[10] is 10;vnew[10] is 10

vnew[11] is 11; vnew2[11] is 11;vnew[11] is 11

vnew[12] is 12; vnew2[12] is 12;vnew[12] is 12

vnew[13] is 13; vnew2[13] is 13;vnew[13] is 13

vnew[14] is 14; vnew2[14] is 14;vnew[14] is 14

vnew[15] is 15; vnew2[15] is 15;vnew[15] is 15

vnew[16] is 16; vnew2[16] is 16;vnew[16] is 16

vnew[17] is 17; vnew2[17] is 17;vnew[17] is 17

vnew[18] is 18; vnew2[18] is 18;vnew[18] is 18

vnew[19] is 19; vnew2[19] is 19;vnew[19] is 19

[ydd@localhost tmp]$

[/codebox]

I think that’s what you wanted, isn’t it?

What card do you have in your machine, and what’s your compile command?

winterfire · March 9, 2009, 6:33pm

Thanks YDD! External Image Yes, that’s what I wanted. Its really strange. I’m getting different results than yours! …Could be the system configurations???

Machine: Telsa S870.

Tesla S870 GPU Computing System

* Four GPUs (128 thread processors per GPU)

* 6 GB of system memory (1.5 GB dedicated memory per GPU)

* Standard 19â€, 1U rack-mount chassis

* Connects to host via cabling to a low power PCI Express x8 or x16 adapter card

* Configuration: 2 PCI Express connectors driving 2 GPUs each (4 GPUs total)

Commands:

nvcc mytest.cu -o mytest

./mytest

Here is my output (I corrected the third column “vnew” to “vnew3”.

[codebox]vnew[0] is 0; vnew2[0] is 0;vnew3[0] is 3

vnew[1] is 1; vnew2[1] is 1;vnew3[1] is 3

vnew[2] is 2; vnew2[2] is 2;vnew3[2] is 3

vnew[3] is 3; vnew2[3] is 3;vnew3[3] is 3

vnew[4] is 4; vnew2[4] is 4;vnew3[4] is 3

vnew[5] is 5; vnew2[5] is 5;vnew3[5] is 3

vnew[6] is 6; vnew2[6] is 6;vnew3[6] is 3

vnew[7] is 7; vnew2[7] is 7;vnew3[7] is 3

vnew[8] is 8; vnew2[8] is 8;vnew3[8] is 3

vnew[9] is 9; vnew2[9] is 9;vnew3[9] is 3

vnew[10] is 10; vnew2[10] is 10;vnew3[10] is 3

vnew[11] is 11; vnew2[11] is 11;vnew3[11] is 3

vnew[12] is 12; vnew2[12] is 12;vnew3[12] is 3

vnew[13] is 13; vnew2[13] is 13;vnew3[13] is 3

vnew[14] is 14; vnew2[14] is 14;vnew3[14] is 3

vnew[15] is 15; vnew2[15] is 15;vnew3[15] is 3

vnew[16] is 16; vnew2[16] is 16;vnew3[16] is 3

vnew[17] is 17; vnew2[17] is 17;vnew3[17] is 3

vnew[18] is 18; vnew2[18] is 18;vnew3[18] is 3

vnew[19] is 19; vnew2[19] is 19;vnew3[19] is 3

[/codebox]

tmurray · March 9, 2009, 6:37pm

S870 is Compute 1.0, so you can’t do DP on it.

mfatica · March 9, 2009, 6:38pm

The S870 does not support double precision (it has G80 GPUs).
You need an S1070 for double precision

winterfire · March 9, 2009, 6:39pm

One more thing:

When I used command “nvcc mytest.cu -arch sm_13 -o mytest” and “./mytest”, I got following output where all three output seemed to be initial values.

[codebox]vnew[0] is 1; vnew2[0] is 2;vnew3[0] is 3

vnew[1] is 1; vnew2[1] is 2;vnew3[1] is 3

vnew[2] is 1; vnew2[2] is 2;vnew3[2] is 3

vnew[3] is 1; vnew2[3] is 2;vnew3[3] is 3

vnew[4] is 1; vnew2[4] is 2;vnew3[4] is 3

vnew[5] is 1; vnew2[5] is 2;vnew3[5] is 3

vnew[6] is 1; vnew2[6] is 2;vnew3[6] is 3

vnew[7] is 1; vnew2[7] is 2;vnew3[7] is 3

vnew[8] is 1; vnew2[8] is 2;vnew3[8] is 3

vnew[9] is 1; vnew2[9] is 2;vnew3[9] is 3

vnew[10] is 1; vnew2[10] is 2;vnew3[10] is 3

vnew[11] is 1; vnew2[11] is 2;vnew3[11] is 3

vnew[12] is 1; vnew2[12] is 2;vnew3[12] is 3

vnew[13] is 1; vnew2[13] is 2;vnew3[13] is 3

vnew[14] is 1; vnew2[14] is 2;vnew3[14] is 3

vnew[15] is 1; vnew2[15] is 2;vnew3[15] is 3

vnew[16] is 1; vnew2[16] is 2;vnew3[16] is 3

vnew[17] is 1; vnew2[17] is 2;vnew3[17] is 3

vnew[18] is 1; vnew2[18] is 2;vnew3[18] is 3

vnew[19] is 1; vnew2[19] is 2;vnew3[19] is 3

[/codebox]

winterfire · March 9, 2009, 6:52pm

I see…To be more specific, I am using XE320/Tesla Cluster with 16 XE320 compute nodes and I ran the code on one node.

btw, I couldn’t find any detailed document about using the Cluster. For example. “I updated /etc/vimhrc setting on the headnode, but this change does not replicate to all nodes”.

Could any of you plz suggest some documents ? You can send to my inbox if possible. Thanks a lot…

YDD · March 9, 2009, 7:01pm

You need to upgrade to the S1070 if you want double precision (as tmurray and mfatica said). I’m a bit surprised that the CUDA runtime didn’t whinge when your Compute 1.3 kernel tried to run on the Compute 1.0 GPU, but it’s not something you should expect to work. As for using the cluster itself… that’s something you need to discuss with the people who run it - such things have a tendency to be uniquely temperamental :)

winterfire · March 9, 2009, 7:38pm

I guess I will use float instead then.

Thanks YDD, and thank you all for the help! External Image

Topic		Replies	Views
Strange change in behaviour between float and double CUDA Programming and Performance	6	1348	April 1, 2009
Issues with double precision support on GT200 CUDA Programming and Performance	7	2759	July 7, 2008
Wrong results for double precision calculations Not setting arch=sm_13 causes incorrect results (onl CUDA Programming and Performance	1	10216	October 26, 2010
Using double precision in CUDA how to turn on double precision in CUDA CUDA Programming and Performance	2	3056	July 27, 2008
Double precision in CUDA 2.3 CUDA Programming and Performance	5	38199	March 5, 2010
Kernel works in single precision but not in double CUDA Programming and Performance	7	1630	July 28, 2009
Code works with floats but not doubles CUDA Programming and Performance	4	5020	July 15, 2009
A possible nvcc bug on double? CUDA Programming and Performance	1	2568	March 16, 2011
GTX280 can not support double, my newest test CUDA Programming and Performance	3	2335	July 18, 2008
double's on the GTX 285 CUDA Programming and Performance	2	1865	June 25, 2009

worked fine for "int" "float" but NOT "double"

Tesla S870 GPU Computing System

Related topics