I noticed that if JIT compilation of embedded PTX code fails (e.g. because I’m using too much shared memory), the program just does nothing at all instead of giving me an error.
Example:
nvcc -arch=compute_11 -code=sm_11 --use_fast_math test.cu -o test
ptxas /tmp/tmpxft_00000c90_00000000-2_test.ptx, line 123; warning : Double is not supported. Demoting to float
ptxas error : Entry function '_Z13kernel_sharedPfjS_S_' uses too much shared data (0x9190 bytes + 0x10 bytes system, 0x4000 max)
I doesn’t fail silently. If you are using the driver API, you will get an explicit error returned when you try to load the PTX. In the runtime API, JIT compilation failure will kill the context, and any further operations on the context will produce errors which can be trapped with the usual runtime error checking.
Hmm. What would be a suitable operation to produce this error in the runtime API? I am calling cudaEventRecord() and other kernels after the failed launch, and they don’t return any errors.
avidday@cuda:~$ nvcc -arch=compute_13 -code=compute_13 -Xptxas="-v" shmerror.cu -o shmerror
avidday@cuda:~$ ./shmerror
Legal case starting
Legal case completed
Illegal case starting
FAILURE, code 0 : no error in shmerror.cu, line 74
The slightly unusual output from the assert is what happens when the context is killed.