CUDA runtime 'invalid argument' error when invoked from JNI

We are trying to develop a CUDA library for Java through the JNI mechanism and encountered this strange problem that all cudaXXX functions return ‘invalid argument’ error when called from the JNI context. I reduced the test code to its bare minimum, like this –

JniTest.java

class JniTest {

static {

                System.loadLibrary("cudatest");

        }

native void nativetest();

public static void main(String args[]) {

                new JniTest().run();

        }

void run() {

String bitness = System.getProperty("sun.arch.data.model");

                System.out.println("sun.arch.data.model = " + bitness);

nativetest();

        }

}

JniTest.c

#include <jni.h>

#include <stdio.h>

#include <cuda_runtime.h>

#ifdef __cplusplus

extern "C" {

#endif

cudatest()

{

   int ver = 0;

   cudaError_t err =  cudaRuntimeGetVersion(&ver);

   printf("Runtime Version = %d, error = %d,%s\n", ver, err, cudaGetErrorString(err));

}

/*

 * Class:     JniTest

 * Method:    nativetest

 * Signature: ()V

 */

JNIEXPORT void JNICALL Java_JniTest_nativetest

  (JNIEnv *e, jobject o)

{

   cudatest();

}

/*

main()

{

  cudatest();

}

*/

#ifdef __cplusplus

}

#endif

As you can see the code doesn’t really do much but just transfer control from Java side to JNI side to call cudaRuntimeGetVersion and print the runtime version. JVM is 64-bit so native/CUDA side is compiled as 64-bit too.

It fails on this particular box running 64-bit RHEL6 with CUDA tookit 4.01 and v290 Nvidia driver. The printout from the program is :

sun.arch.data.model = 64

Runtime Version = 0, error = 11,invalid argument

I tried other CUDA runtime functions except kernel launch and they all return this ‘invalid argument’ error.

What is stranger is that I also tried to compile the code (by uncomment the main() function) as a standalone program and it works and correctly print the ‘Version = 401’ message. It also works if I load this shared library through dlopen from a native program, on this same box. In other words, it only fails when it is invoked through JNI.

The same Java+JNI logic works fine on my other dev box, which runs Ubuntu 10.04 with the same CUDA tookit and Nvidia driver version.

We even swapped the graphics cards between two systems and that doesn’t help – it always fails on the RHEL6 box no matter which card is in it.

Directly calling the CUDA driver API works fine under JNI but we don’t intend to use the driver API so that doesn’t help.

I tried to debug the native side with gdb around the CUDA call and it seems that after the call return with the failure, the libcuda.so is not loaded into the process at all, while in the setup that works, the libcuda.so is loaded in the process.

Since the CUDA runtime (libcudart.so) is a wrapper around the CUDA driver API, it seems that somewhere inside CUDA runtime and before it forwards the call to driver API it detects some unfavorable environment condition and just decide to abort the function without even invoking the driver API.

Without the source code of the CUDA runtime, it’s difficult for us to figure out what exactly is upsetting the CUDA runtime for such a simple test program. I am hoping that somebody encountered similar problem in the past or somebody from Nvidia who has access to the CUDA runtime source code or logic can shed some light on this.

Thanks!

You need to use a more recent driver for CUDA 4.
Download the one from the developer page, or even better update to CUDA 4.1 and the corresponding driver (285.05.33)

Sorry I made a typo in the original post. It should be v290 , not v190. We are using the latest CUDA toolkit (4.1, though the cudaRuntimeGetVersion returns integer value ‘401’) and the release driver (290.10; driver api lib: libcuda.so.290.10).

What puzzles us is that we have the same CUDA toolkit and graphics driver on both systems. It’s only on the RHEL6 and only when we call CUDA runtime from the JNI environment that has the problem. It almost sounds like that CUDA runtime doesn’t like to run in a thread created by JVM on RHEL6. Since CUDA runtime does have to maintain some per thread structure (the attachment of CUDA context to a thread), I am wondering if that has something to do with it. BTW, we tried Oracle JDK 1.6.0_29 and 1.6.0_30, as well the default OpenJDK on RHEL6. All with the same error.

Thanks.

Sheldon

Upon further investigation I found out that cuInit() driver API will fail too under JNI, with error code 1 (invalid value). cuDriverGetVersion works and return 4010. cuInit failure apparently is the reason why all the CUDA runtime functions fail.

I had a look at what libcuda.so has dependency on and it seems that it uses quite a few pthread functions and probably tries to enumerate the current threads of the host process when cuInit is called. That would explain why there is a difference between when cuInit is called straight from a native application and when it is called from JNI – the JVM has already created 28 threads at the time cuInit is called from JNI. If cuInit tries to enumerate the current threads and do something with them, there could be a chance that the particular way RHEL6/JVM creates these threads will gets in the way of cuInit().

With that assumption in mind, I devised a test workaround by writing a native program which calls cuInit first then create the JVM through JNI invocation interface to load and run our Java app with CUDA-based native methods. Unsurprisingly that works.

This seems to be a bug in cuInit().

Sheldon

What happens when you just call pthread_key_create a couple of times from inside the place where cuInit() fails? Does that call fail? If so, then if cuInit() tries to allocate some TLS storage using that call, it may fail (causing cuInit, in turn, to fail).

That seems to be fine. I tried calling pthread_key_create in a loop for like 20 times and all are successful, both before and after the cuInit() call. I even tried to associate some TLS data with pthread_set_specific and that works too. I thought too that TLS could be the problem but apparently it is something else.

For now we’ll just live with the workaround and hope the next version of cuda driver will fix it.

Thanks.

Sheldon

Could you please file a bug with a repro?