Double Precision Problem

Hi to all,
I developed a code in single precision (compiled with the option -r4) and it run without problems.
I bought a new GPU that supports the double precision and now i have compiled my code with the -r8 option but the executable gives me problems when i launch a kernel.
I am instrumenting my code in order to investigate where is a possible bug, do you have some suggestions?
Thanks a lot to all,
Enrico

Hi Enrico,

but the executable gives me problems when i launch a kernel.

What’s the error?

  • Mat

It’s a general error:

unspecified launch failure

Enrico

Hi Enrico,

Are you able to run any CUDA code on the new device? Does the original version compiled with “-r4” work? If not, try updating your CUDA Driver.

Otherwise, “unspecified launch failure” typically means your kernel abnormally aborted for some reason such as a memory access violation.

  • Mat

Hi Mat,
First of all, thanks for the reply.
When i compile my code with the “-r4” flag, it runs without any problem.
I investigated my code applying manually the types “double precision” and “double complex” instead of “real” or “complex”.
I understood that i have a problem with the Z matrix in my MoM code. In particular, when i use it in single precision, the results are correct, but when i consider it in double precision, the values of the matrix become not correct.
I hope i’ve been clearer…Do you have any suggestion?

Enrico

Hi Enrico,

Are you using CUDA Fortran or the PGI Accelerator Model?

Doubling the data type size also doubles the amount of memory you’re using. Could you be running out of memory? If you are using CUDA Fortran, check the allocation status after each allocate call: “allocate(arr(somesize),stat=istat)”

  • Mat

Hi Mat,
I am using CUDA Fortran. I checked the Memory Status and this is not the problem.
I have a problem with a matrix; the code compiles and runs but the values of the matrix elements are not correct when i declare it as “double complex”.
I don’t know how to do…

Enrico

Hi Enrico,

Is there any way you can write up a reproducing example?

Thanks,
Mat

Hi Mat,
I tried to reproduce an example, but for simple codes, there aren’t problems during execution and the results are correct.
I can give you all the code, but it’s composed of 5 files (1 main and 4 modules) and a makefile, so it’s not simple and quick to understand.

Thanks,
Enrico