I buy a GTX-260 card, under official documentation, this chip
supports double-precision, but when I use “double” to do calculation,
the result is wrong, however it works when I use emulation mode.
(my approach is to change “float” to “double” in original code which
works under “float” computation )
while compiling pass the flag “-arch sm_13” without the quotes to the compiler…
ie. nvcc -arch sm_13
this will enable the program to compile for double precision. if you are still getting wrong answers after this then the problem is with your code. I assume that you are using the latest version of the cuda toolkit.
However when I use “”-arch sm_13" on single precision “float”, it also works,
does nvcc take all “float” to “double” ?
besides, in manual of nvcc
--gpu-code <gpu architecture name>,... (-code)
Specify the name of nVidia gpu to generate code for.
Unless option -export-dir is specified (see below), nvcc will embed a compiled
code image in the executable for each specified 'code' architecture, which is a
true binary load image for each 'real' architecture (such as a sm_13), and ptx
code for each virtual architecture (such as compute_10). During runtime, such
embedded ptx code will be dynamically compiled by the cuda runtime system if no
binary load image is found for the 'current' GPU, and provided that the ptx
level is compatible with this current GPU.
Architectures specified for options -arch and -code may be virtual as well as
real, but the 'code' architectures must be compatible with the 'arch' architec-
ture. For instance, 'arch'=compute_13 is not compatible with 'code'=sm_10,
because the earlier compilation stages will assume the availability of com-
pute_13 features that are not present on sm_10. This option defaults to the
value of option '-arch'.
Allowed values for this option: ‘compute_10’, ‘compute_11’, ‘compute_13’,
it’s in the appendix A of the programmer’s guide, I think–compute capabilities of each card.
-arch sm_13 does not convert all floats to doubles, but if you have constants without a “f” suffix (2.0 instead of 2.0f, for example) they will be converted to doubles because that is their traditional C datatype.