I met some issues on cuda code running on linux.
I am not sure they are related to the super or not.
I checked the LD_LIBRARY_PATH, and both RC3.1 and RC4.0 have the same issues.
anyone can help me?
thank you very much.
issue 1: when run SDK exmaple, it shows 'Runtime API error : all CUDA-capable devices are busy or unavailable’
[jwang@super-496 ~]$ cd NVIDIA_GPU_Computing_SDK_4.0/C/bin/linux/release
[jwang@super-496 release]$ echo $LD_LIBRARY_PATH
:/opt/cuda4RC/cuda/lib64
[jwang@super-496 release]$ ./histogram
[histogram]
./histogram Starting…
CUDA device [Tesla C1060] has 30 Multi-Processors, Compute 1.3
Initializing data…
…allocating CPU memory.
…generating input data
…allocating GPU memory and copying input data
main.cpp(102) : cudaSafeCall() Runtime API error : all CUDA-capable devices are busy or unavailable.
issue 2: my cuda code worked correctly two weeks ago, but now shows error of “invalid device function†after launch the kernel, even launch a very simple or empty kernel.
[jwang@super-496 HHT]$ ./Test_HHT_CUDA
Cuda error: INDEX_CUDA: invalid device function.
I using a mbp5 with MAC OS X SL 10.6.7, Geforce 9400M-256MB
CMake 2.8 and CUDA 4.0.
And I get exactly the same error. I cann’t run a kernel, even a empty one without catching the error.
The relevant part is:
KERNEL:
[indent]global void kernel_SetToOne(float a, int nx, int ny, int pitch) {
[indent] int c = blockIdx.x * blockDim.x + threadIdx.x;
int r = blockIdx.y * blockDim.y + threadIdx.y;
if (c < nx && r < ny) {
[indent] int index = cpitch + r;
a[index] = 1.;
[/indent] }
[/indent]}
[/indent]
CALL:
[indent]
int nx = 256;
int ny = 256;
flaot* matrix;
cudaMalloc( (void **)matrix, nxnysizeof(float) );
kernel_SetToOne <<<dim3(16, 16), dim3(16,16) >>>( matrix, nx, ny, nx);[/indent]
NVidia never provided a solution to this very common problem. Bad NVidia!!
try the things mentioned here:The Official NVIDIA Forums | NVIDIA
I found a solution.
I did compile with the “-arch=sm_13” option. But the CUDA System of a mbp5 only allow architecture 1.1:
deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
There is 1 device supporting CUDA
Device 0: "GeForce 9400M"
CUDA Driver Version: 4.0
CUDA Runtime Version: 4.0
CUDA Capability Major/Minor version number: 1.1
Total amount of global memory: 265945088 bytes
Multiprocessors x Cores/MP = Cores: 2 (MP) x 8 (Cores/MP) = 16 (Cores)
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 256 bytes
Clock rate: 1.10 GHz
Concurrent copy and execution: No
Run time limit on kernels: Yes
Integrated: Yes
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads can use this device simultaneously)
Concurrent kernel execution: No
Device has ECC support enabled: No
Device is using TCC driver mode: No
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 4.0, CUDA Runtime Version = 4.0, NumDevs = 1, Device = GeForce 9400M
I’m not sure but it seems to be a kind of architecture development
After changing all from
-arch=sm_13
entrys to
-arch=sm_11
and recompiling the hole thing, it works fine.
I suddenly got that error too and not much info…weird thing is the code was running perfectly before i added a kernel. The added kernel caused this error. I switched the code back to the initial state and the invalid device function error persists…!!!
just to answer my own question. The function that caused the above is this:
__global__ void kFindNumBlocks(cudaGridSimulation_t cGridSimIn, cudaNBSimulation_t cSimIn) {
if (threadIdx.x == 0) {
uint flatCounter = 0;
for (int i = 0; i < cGridSimIn.xs * cGridSimIn.ys; ++i) {
uint zOffset = cGridSimIn.offsets[i];
if (zOffset > 0) {
uint numZs = getNumZs2(zOffset, 256);
for (int j = 0; j < numZs; ++j) {
flatCounter++;
}
}
}
cSimIn.numBlocks = flatCounter;
printf("nd: %d\n", cSimIn.numBlocks);
}
}
The kernel call was commented out while the error was present. The error dissapeared when i deleted(commented out) the above code from the cu file …hmmm