cudaLaunchKernel returned (0x30)

Hi,

I refreshed and upgraded my systems but having difficulty in running my cuda codes. The new environment is ubuntu 18.04 with cuda10. Here is the nvidia-smi output:

Tue Dec 4 09:19:35 2018
±----------------------------------------------------------------------------+
| NVIDIA-SMI 410.72 Driver Version: 410.72 CUDA Version: 10.0 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro M1200 Off | 00000000:01:00.0 Off | N/A |
| N/A 49C P0 N/A / N/A | 423MiB / 4046MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1749 G /usr/lib/xorg/Xorg 129MiB |
| 0 1908 G /usr/bin/gnome-shell 143MiB |
| 0 2611 G …uest-channel-token=18113097727299320948 44MiB |
| 0 4212 C …ayci/workspace/cpp/xDNN/Debug/test_xDNN 92MiB |
±----------------------------------------------------------------------------+

I build my executable and library smoothly but the test results are totally wrong. When I debugged the code on nsight I got this gdb output:

Coalescing of the CUDA commands output is off.
$1 = 0xff
The target endianness is set automatically (currently little endian)
No source file named SyncedCudaMem.cpp.
No source file named ImagePreProcessor.cpp.
[Thread debugging using libthread_db enabled]
Using host libthread_db library “/lib/x86_64-linux-gnu/libthread_db.so.1”.

Temporary breakpoint 3, main (argc=1, argv=0x7fffffffddc8) at /home/bozkalayci/workspace/cpp/xDNN/tests/main.cpp:33
33 testImagePreProcessor();
[New Thread 0x7fffbd67d700 (LWP 4232)]
[New Thread 0x7fffbcca8700 (LWP 4233)]
[New Thread 0x7fffb7fff700 (LWP 4234)]

Thread 1 “test_xDNN” hit Breakpoint 2, xDNN::ImagePreProcessor::apply_cuda_ (this=0x7fffffffda80) at /home/bozkalayci/workspace/cpp/xDNN/tools/ImagePreProcessor.cpp:133
133 temp_data_.set_gpu_data(in_data_.mutable_gpu_data(), in_size_);

Thread 1 “test_xDNN” hit Breakpoint 1, xDNN::SyncedCudaMemory::to_gpu (this=0x7fffffffdc30) at /home/bozkalayci/workspace/cpp/xDNN/cuda/SyncedCudaMem.cpp:153
153 x_gpu_memset(size_, 0, gpu_ptr_);
[Launch of CUDA Kernel 0 (memset32<<<(233,1,1),(512,1,1)>>>) on Device 0, level 0]
Cuda API error detected: cudaLaunchKernel returned (0x30)

I am stuck with this UNKNOWN API ERROR. Any help, idea is very welcome.

Can you send us a repro sample?

I run into this problem today using cuda10 in Telsa T4 card.

Finally, I find that the problem is the wrong GPU arch config. when I compile codes, I used compute_61 which is for Telsa P4, but it’s not correct for T4 card. so I use compute_75 and this problem has been solved.