CUDA_ERROR_INVALID_VALUE in different architectures

Hi all,

I compile and run exactly the same program in two different architectures, one local, and one remote (of course with the different libraries needed), and I get this problem:

Exception in thread "main" jcuda.CudaException: CUDA_ERROR_INVALID_VALUE
	at jcuda.driver.JCudaDriver.checkResult(JCudaDriver.java:282)
	at jcuda.driver.JCudaDriver.cuMemAlloc(JCudaDriver.java:3443)

I use JCuda and it seems that is failing doing the memory allocation ONLY in the remote device. The local one is running normally.

  • could be the -ptx compiling? I use for both
nvcc -ptx filename.cu -o filename.ptx
  • or may be problems in rights in the remote server?

Thanks in advice, Riccardo

  • Are the GPUs different on the two systems? Which GPUs are installed?
  • What’s the Streaming Multiprocessor version CUDA is reporting on the two GPUs?
  • What SM version is in the .target line of the resulting PTX file?
  • What kind of remoting?
    Windows Remote Desktop is not running the graphics on the a GPU. That mirror desktop is not attached to any display. You would need a VNC connection or some other technology allowing GPU access in that case.

I’m using a HPC via ssh

  • Kepler GK110
  • .version 3.1
    .target sm_35

local machine

  • GTX 770
  • .version 1.4
    .target sm_10, map_f64_to_f32

The .version is the PTX ISA version, 3.1 is generated by CUDA 5.0.
I don’t know what CUDA Toolkit .version 1.4 is from.
Which CUDA toolkit are you using on the local machine?

The local machine is using a sm_10 and converts 64-bit floating point operations to 32-bit.
But the GTX 770 is a GK104 which supports sm_30 so you’re not using it to its full potential.

Does it work when you compile the PTX with -arch sm_30 on both systems using CUDA Toolkit 5.0?
Does it change when adding --use_fast_math to the nvcc command line?

local machine is 5.5, while remote 5.0.
using the parameters that u said is not changing anything (but take more advantage from the device - better than nothing).

Ok, I’ll need to leave it for the CUDA and Linux experts to answer then.