Double precision PRoblem

ok i have gtx260 and here is my problem

i wrote a cuda code that is solving mathematic equations …

when i use FLOAT variables in the code i get the normal results…

when i use DOUBLE variables in the code i get 0.0000 in each variable…

why is that? i tested and another code and i had the same results…

260 doesnt support double presicion? which versions does?


Compile with -arch sm_13, and you’ll have double precision on GTX 260/280/Quadro FX 5800/Tesla C1060/S1070.

You missed the Quadro FX 4800 (and Tesla S1075) :whistling: ;)

I did not miss the Tesla S1075. ;)

ok…new problem…

i used the command , it worked and it got results but the numbers are wrong… for example instead of 80.000 it got 120.000 etc like that…

this means there might be a problem with my GPU ?

how can it give right numbers in single precision and wrong numbers in double?

Or (infinitely more likely) you’ve done something wrong? How do you expect us to tell you without code, compiler options, your configuration, etc.?

ok. First i cant post the code cos its my work on a project and its a code about 900 lines… but by the fact that the code gives normal results with FLOAT variables means that there is no problem with it wright?When i simply search and replace all the FLOAT with DOUBLE then i get wrong results.

Compiler option? i use this to compile in command prompt : nvcc -arch sm_13 -o sbj.exe

what do you mean by configuration? its gtx260 with default settings no OC and things like that with the latest drivers tha nvidia has.

if it’s 900 lines, how about trying to replicate the problem with a minimum amount of code?
and no, just changeing float to double won’t do the trick…

i’ll just give a short example:
you have a kernel with float arithmetic and there is some “shared float foo[3000];” standing around.
what happens if you change the float to double? will it still work?

no! because you would not use 12.000 bytes of shared memory but 24.000 bytes! and as this is more than 16k, you’ll either get a compiler error (if you’re lucky) or you’ll just see garbage as a result.

do you see what i mean? it’s usually errors of this sort that produce such results. but we can only help you, if you show us some source, where the error is present.

16k is the limit in memory? which means i can use up to 2000 variables in double?

16kb is the amount of shared memory available within every multi processor.
for a list of all the limits, see cuda programming guide, pages 78f.

ps: don’t try to use exactly 16kb, it won’t work, as some bytes are already reserved for function parameters and variables like threadIdx.

device void prog1(…)
device void prog2(…)
global void computations ()
prog1(…) ;

prog2(..) ;



int main(){
bla bla

computations <<<1,1>>> (...)

bla bla


This is the body of my program…

When i tried to complie it (and using float variables) with this command : “nvcc -D_CRT_SECURE_NO_DEPRECATE -ptx -o jet.exe”
it sais on screen : “Program too big to fit in memory”

why is that?

And also as you said shared memory is limited to 16KB . When i declare a variable , without any variable qualifier such as device or shared when the variable is saved generally?

and specially when i declare it inside int main() , or inside a global , or a device function ?

Variables without qualifiers go into registers.

ok thx seibert.

anyone got a clue about the others i posted?

leave out parts of your program, see if it compiles. if not, leave out more.
if you’ve found something that causes the issue and post what you had to delete last to make it work, it should be much clearer, what the problem is.

ok. thats what i am trying now.

but plz someone tell me when a variable is stored in shared memory ?

and what does this ''program too big to fit in memory ‘’ mean ?

A variable is only stored in shared memory if you put a shared qualifier before the data type, i.e.:

shared float x;