Problems with CUDA


I’m having a little problems with CUDA on my Gentoo PC. I’m currently using CUDA SDK / toolkit 4.2 and gcc 4.6.3. Everything compiled without problems, and when I try deviceQuery or bandwidthTest both of them pass.
Hovewer, when I try to use engine-cuda or libgpucrypto I’m getting some strange errors:

Gentoo engine-cuda-master # openssl
OpenSSL> engine -t dynamic -pre SO_PATH:/usr/local/lib/engines/ -pre ID:cudamrg -pre LIST_ADD:1 -pre LOAD
(dynamic) Dynamic engine loading support
[Success]: SO_PATH:/usr/local/lib/engines/
[Success]: ID:cudamrg
[Success]: LIST_ADD:1
[Success]: LOAD
Loaded: (cudamrg) OpenSSL engine for AES, DES, IDEA, Blowfish, Camellia and CAST5 with CUDA acceleration
Successfully found 1 CUDA devices (CUDART_VERSION 4020).
Cuda error 10 in file ‘’ in line 123: invalid device ordinal.

Gentoo bin # ./aes_test -m ENC

AES-128-CBC ENC, Size: 16KB

#msg latency(usec) thruput(Mbps)
aes_test: void aes_context::cbc_encrypt(const void*, long unsigned int, long unsigned int, long unsigned int, long unsigned int, long unsigned int, unsigned char*, long unsigned int, long unsigned int, unsigned int, unsigned int): Assertion `cudaGetLastError() == cudaSuccess’ failed.

If anyone could tell me what causes them that’d be awesome :)
Thank you in advance for your help.


Unfortunately, you are not showing any code, so I can only guess that the “invalid device ordinal” is reported by a cudaSetDevice() or similar API call. I tried finding a file in the OpenSSL source browser, but no such file seems to esit. Presumably the code tries to talk to a GPU using a certain device index, but no GPU with that index is known to the driver. If so, you need to find the device id for the GPU(s) in the system, and compare the device id passed to cudaSetDevice()against that list.

(1) Is this a system with multiple GPUs ?
(2) That is the ouput of lspci | grep nVidia ?
(3) What is the output of ls /dev/nvidia* ?
(4) What is the output of nvidia-smi -q ?
(5) Are you using the display driver required for CUDA 4.2 ?
(6) Did you change GPUs after installing NVIDIA drivers ? If so, try installing the drivers again.
(7) Check that the GPU(s) are correctly seated in their slots
(8) Check that the GPU(s) are connected to the required number of power connectors (6-pin, 8-pin)

I am not familiar with Gentoo. Is this Linux version on CUDA 4.2’s list of supported platforms ? Does this Linux version require the blacklisting of nouveau drivers to use NVIDIA drivers ?

Hello, njuffa!

Both of the codes are publicly available:

If you’d like to take a look, is HERE, and you were right, line 123 is: _CUDA(cudaSetDevice(6));. I tried setting it to 0 and 1, but it didn’t change anything. How do I find the correct device index? deviceQuery from samples wasn’t much help.

As for the libgpucrypto: HERE’s the . Unfortunately, I have no idea what’s wrong with this one, and I think it would be most useful to me :(

  1. Nope, I’m using single GTX 580.
  2. 01:00.0 VGA compatible controller: NVIDIA Corporation GF110 [GeForce GTX 580] (rev a1)
    01:00.1 Audio device: NVIDIA Corporation GF110 High Definition Audio Controller (rev a1)
  3. /dev/nvidia0 /dev/nvidiactl
  4. I don’t have nvidia-smi, only nvidia-settings
  5. I’m using the latest version of nvidia-drivers. Do you think compiling older version might help?
  6. I tried rebuilding nvidia-drivers several times, didn’t help.
    7 & 8 - Done.

CUDA 4.2 is supported by Gentoo:

[I] dev-util/nvidia-cuda-sdk
Available versions: 2.02.0807.1535^b ~2.1.1215.2015^b ~2.2^b ~2.2-r1^b ~2.3^b ~3.0_beta1^b ~3.0^b ~3.1^b ~3.2^b ~4.0^b ~4.1^b (~)4.2 {{+cuda debug +doc emulation +examples opencl}}
Installed versions: 4.2(03:11:37 12/02/12)(cuda debug doc examples opencl)
Description: NVIDIA CUDA Software Development Kit

[I] dev-util/nvidia-cuda-toolkit
Available versions: ~3.2^b 4.0^b ~4.1^b (~)4.2 {{debugger doc profiler}}
Installed versions: 4.2(03:07:17 12/02/12)(debugger profiler -doc)
Description: NVIDIA CUDA Toolkit

As for blacklisting nouveau - I don’t know, because I don’t use it anyway.

Try to put the assert(cudaGetLastError() == cudaSuccess); before the kernel calls,
lets say at line 38 and 54 and 63 to make sure if the problem didnt occur before the kernel.
If the code passes 38, 54 and 63 than its something in the kernels themselves.
If so, try to remove code from the AES_cbc_128_encrypt_gpu kernel and see if now it passes,
open/close code in the kernel carefully as not to cause the compiler to remove the kernel code
as part of its dead-code optimizations.

Also, seems that both kernel calls uses streams even though the if statement in line
65 indicates there should be two paths: one with streams and the other without. Are you sure
you’re not using the streamless path with an un-initialized stream?



I put the code in the lines you wanted, and i still get:

AES-128-CBC ENC, Size: 16KB

#msg latency(usec) thruput(Mbps)
aes_test: void aes_context::cbc_encrypt(const void*, long unsigned int, long unsigned int, long unsigned int, long unsigned int, long unsigned int, unsigned char*, long unsigned int, long unsigned int, unsigned int, unsigned int): Assertion `cudaGetLastError() == cudaSuccess’ failed.

I can’t find AES_cbc_128_encrypt_gpu anywhere. As for your second question: I really don’t know. I was just following manual on the developers site, but it’s pretty lame.

Thanks for posting the data. I don’t see anything out of place. Running an older CUDA runtime with the latest drivers should just work (I use that all the time when I work with older versions of CUDA), the other way around is what doesn’t work (i.e. newer CUDA runtime on top of older driver). Nothing appears to point to a problem with CUDA and/or the GPU, and as you stated in your original post you can successfully run some example CUDA codes.

Looks like you will have to debug this from the code itself. As eyalhir74 says, make sure all CUDA API calls are checked to make sure the reported error isn’t just a followup error to an earlier one upstream. I am not familiar with these codes. Do they use configuration files maybe that could be misconfugured for your platform, e.g. set up for multi-GPU while your machine has a single GPU? Is there an online forum for this software? If so, you may want to ask there to see whether anybody has run into the same or a similar problem.

They do use config files. I already tried contacting developer, but he didn’t respond :( There’s also very little documentation, and so far I haven’t seen anyone with such problem.

Good news everyone: I managed (with some help) to run engine-cuda. Turns out I had to change _CUDA(cudaSetDevice(6)); to 0, as I did before, and COMPILE AGAIN (I don’t know why didn’t do before - I’m such a noob). Anyway, if someone knows how to fix libgpucrypto I’d be most grateful.

Hardcoding the device ID in a cudaSetDevice() call inside an application strikes me as a bad idea, even when coding the device ID as 0, which probably is always valid if there is at least one CUDA device, but I am not sure that this is guaranteed. A better strategy would be to construct a list of all CUDA devices at startup, then pick the most appropriate one for the app from the list, and / or allow the user to specify the device ID of the GPU to use.

As for the problem with libgpucrypto(), the first step should be to dump the exact CUDA status at the point of failure, as right now it is only known that it is not cudaSuccess. Then debug from there.