jCUDA - Java library for CUDA Windows support

Hello everyone,

We are pleased to announce the availability of jCUDA, a Java library for interfacing CUDA and GPU hardware.
The library is supported under Linux and Windows for 32/64 bit platforms.

From the current features it provides: CUDA API, CUFFT routines and OpenGL interoperability.
CUBLAS suport will be added in the future.

The files contain JavaDoc, examples and necessary files to work with the library.
You may download it at: jCUDA (free for non-commercial use).

Best Regards,

Just what I’ve been waiting for! I actually haven’t checked the nVidia forums in about 3 months, but I decided to check today just to see if anything existed for Java with CUDA. It’s a good thing I didn’t do this yesterday instead! :) Also, does every person who wants to run a program made with jCUDA have to download jCUDA, or can it be included in the distribution of the program? In addition, how exactly do you use this library?

Hi StarBP,


You are able to distribute your application with the library of course. But pay attention to licensing information (free for non-commercial use, otherwise you may contact us according to the jCUDA page, we will be glad to help).

The file to download contains the necessary files to work with the library + some examples.

We are planning to release a newer version close to the end of March with updated OO API.

Hi There;

Thanks for the GLASS Group for porting the CUDA native to Java, unfortunately It’s seems the library wont go easy on me;

With A little mod in the CUDADriver.class(Hopefully you dont mind… ^_^ )

Im quite sure I put the library at the right path or did I miss something?cause from my experience this has something to do with the library…

Hi anak,

The following DLL files should be under the PATH environment variable:

  1. jcuda.dll (the DLL should be renamed from libjcuda.dll -> jcuda.dll, it will be fixed soon).

  2. cufft.dll/cudart.dll (from CUDA installation).

Currently the FFT routines are merged into the one DLL, but we have thoughts to seperate them.

Now fixed.

Both download files for Windows (32/64 bit) were changed.



here is one question about using jCuda.

(-) If I put the kernel code in several .cu files, then how can I use them?
For example, a.cu and b.cu. a.cu contains the global kernel
function, but internally will use some function in b.cu.
So I should
nvcc a.cu --cubin
nvcc b.cu --cubin

Then how to launch the global kernel function in a.cu

BTW, beside the “examples” directory, any other samples code we can reference?

Hello Qinlz,

Your issue is known.

The issue is with the ability to create a FAT cubin (one module with multiple files, NVIDIA should assist about that, once the feature is available from the compiler).

If you manage to compile each *.cu file into a *.cubin file it means that they are independent of each other (every function can work on its own).

Otherwise, you may need to include the source in the 1st a.cu in b.cu so that the compiler would recognize the necessary functions.

In any case, you can consider the following options:

  1. It is possible to work with multiple modules loaded in the GPU memory (jCUDA also support this through overloaded functions). So that you would load module a.cubin and consume function “a”, then load module b.cubin and consume function “b”. Multiple modules can coexist together, but when you specify the function to launch, then CUDA driver knows to relate it to the relevant module.

When you unload a module, it will not be available for later executions.

  1. Have 1 source file containing multiple global kernel definitions, it is OK, since when you get a function from a module you distinguish between them using their name (which is in a nice form if you use the extern “C” addition).

I hope the following information helps you. If no, please let me know what other issues are necessary.

Best Regards,



I am trying to use jcuda on Mac and unfortunately getting the same error.

Exception in thread “main” java.lang.UnsatisfiedLinkError: no jcuda in java.library.path

at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1753)

at java.lang.Runtime.loadLibrary0(Runtime.java:823)

at java.lang.System.loadLibrary(System.java:1030)

at jcuda.driver.CUDADriver.<clinit>(CUDADriver.java:909)

at jcuda.CUDA.init(CUDA.java:62)

at jcuda.CUDA.<init>(CUDA.java:42)

at examples.LoadModule.main(LoadModule.java:29)

I tried renaming libjcuda.so and libjcudafft.so to jcuda.so and jcudafft.so and added them to my PATH variable. It still does not work.

I need some help.

Hi neelak,

Under MacOSX, shared libraries are postfixed with *.dylib.

We are preparing a version to be supported for MacOS as well, and should be ready soon.

The usual SO files in Linux are not suported under Mac.

This is why you get an error stating the library cannot be found (the name is search different).

You can also check that by tracing the execution of the Java application to see which files it is searching for.



Hi –

I am currently working on a program in jCUDA and can’t seem to get it to agree with me (for the most part, however, jCUDA is working great!). Since there is very little documentation, I wanted to write to see if you might be able to help me figure out what’s going wrong.

I’ve got a program in C+CUDA that I’m trying to translate to Java+jCUDA for my research. The program has the following lines that I’m trying to replicate:

[codebox]static texture<float4, 2, cudaReadModeElementType> tex;

cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc();

cudaArray * bind(float4 *d_src, int width, int height, texture<float4, 2, cudaReadModeElementType> &t) {

cudaArray *d_imageArray = 0;

CUDA_SAFE_CALL( cudaMallocArray(&d_imageArray, &channelDesc, width, height) );

CUDA_SAFE_CALL( cudaUnbindTexture(t) );

uint size = width * height * sizeof(float4);

CUDA_SAFE_CALL( cudaMemcpyToArray(d_imageArray, 0, 0, d_src, size, cudaMemcpyDeviceToDevice));

t.filterMode = cudaFilterModeLinear;

t.normalized = false;

CUDA_SAFE_CALL( cudaBindTextureToArray(t, d_imageArray, channelDesc) );

return d_imageArray;



I’ve got a few questions:

  1. Does jCUDA have an equivalent to cudaUnbindTexture?

  2. To have access to “tex” within my Java code, I just use the following line, right: CUtexref tex = cuda.getModuleTexRef(“tex”);

  3. Is the following an accurate translation of the call to cudaMemcpyToArray (it doesn’t seem to do anything!):

[codebox] CUDA_MEMCPY2D copyData = new CUDA_MEMCPY2D();

    copyData.dstArray = result.getValue();

    copyData.dstXInBytes = 0;

    copyData.dstY = 0;

    copyData.dstMemoryType = CUmemorytype.CU_MEMORYTYPE_ARRAY;

    copyData.srcDevice = src.getValue();

    copyData.srcMemoryType = CUmemorytype.CU_MEMORYTYPE_DEVICE;

    copyData.srcXInBytes = 0;

    copyData.srcY = 0;

    copyData.WidthInBytes = width*SizeOf.FLOAT4;

    copyData.Height = height;


  1. How do I turn of normalization?

Thanks for your help –

  • jms…

That quite some work you’re doing there…but unfortunately from my point of view JCUDA still just a simple code and need alot of work to be done in order to catch up with it native language , since that the case I think some of the code you post here are not use by the JCuda just yet , probably you should contact moti if you badly need to modified/add the source code for JCuda since I the license is (only) freeware.

PS:probably you can work out the jCUDA 1.2 or better and I looking forward for that… :haha:

Hi justo1, anak,

I think both of you are correct.

It is time to release jCUDA 1.2 also to support new features in CUFFT etc.

I hope to have more updates soon about that.

justo1, about your questions:

  • First of, the jCUDA API is mapped to CUDA C API for simplicity. To be more precise it is the driver API being used on not the runtime (like you are experiencing). So if you need some further documentation you can always consult the CUDA reference docs for the exact definition of every function. NVIDIA has a counterpart in the driver API for every runtime function call, some may offer greater flexibility.

  • Mattering with textures, arrays etc., it’s a bit difficult to work with them from very high-level languages like Java (.NET offers a bit more support for native interoperability). For example, you can see that copying data to a 2D texture involves the use of a copy2D function, this in turn takes as parameters only pointers to data.

So there are few ways to accomplish this task:

  1. Allocate native pointer and copy data to using the NativeHelper/Utils class (copy from a Java array to a native pointer), then specify the allocated pointer in the structure parameters. This involves further understanding of how to perform 2D copies (or 3D) and may take some time to master.

  2. Copy the Java array to a device pointer, then perform another copy from the device pointer to the array (since both reside in the GPU memory).

  • Just in short, about texture management functions, to attach memory to a given texture, you need to use first one of the cuTexRefSet* functions (either array or address for linear pointer). After performing this, you need to set the texture as parameter to the function you are going to call through cuParamSetTexRef. Prior to calling any of this functions you should call the cuModuleGetTexRef to get a “reference” to the texture you are about to use. If I’ll try to create some basic scenario for you to use textures:
  1. Get a reference for the texture by its name (cuModuleGetTexRef, assuming the module was loaded of course)

  2. Allocate an array or device pointer and set its data (can be a simple copy or 2D/3D copy)

  3. Set the memory the texture is bound to (cuTexRefSetAddress/cuTexRefSetArray)

  4. Set this texture as parameter to the kernel you are about to launch (cuParamSetTexRef)

  5. Perform any specific texture properties set (filtering etc.)

  6. Execute your code.

  7. Detach/unbind the memory from the texture (cuTexRefSetAddress/Array with a zero argument, NULL)

You can consult the documentation for more information about which texture functions perform what action, it really helps and you can get some insight of the capabilities and limitations of each.

I hope this answers your questions if not partially.

Feel free to send an email or post a reply at any time (for some reason this post didn’t appear at first).