Kernels Fail to Launch on Jetson TX-1

After porting/building my codebase on the Jetson TX-1 (running linux tegra v3.10.697)
I’m seeing multiple cuda launch failures.

Using cuda-gdb I see the following backtrace:

#0 0x0023a2a0 in cudart::configData::addArgument(void const*, unsigned int, unsigned int) ()
#1 0x002253ec in cudart::cudaApiSetupArgument(void const*, unsigned int, unsigned int) ()
#2 0x0024f1a0 in cudaSetupArgument ()
#3 0x001e81f8 in __device_stub__Z20ivgpu_threshold_grayPK6uchar4PS_jj (__par0=0xe982a600, __par1=0xe9838800, __par2=14400, __par3=1)
at /tmp/tmpxft_0000594c_00000000-4_ivgpu_Utility.cudafe1.stub.c:1

I’m also seeing “CUDA_EXCEPTION_4, Warp Illegal Instruction” when calling a different
function with debugging enabled.

All of this code runs fine on other platforms, Quadro, Telsa etc.

Any help would be greatly appreciated!

I was able to “fix” the CUDA_EXCEPTION_4 by changing a casting operation with a cuda malloced buffer from

unsigned char* pBuffer = ( … )

from

(unsigned int*)( pBuffer )

to

(uchar4*)( pBuffer )

when launching a specific kernel operation.
Not sure why this made a difference. Should still be 4 bytes.

I still have the original issue. Other kernels are successfully launching but I’m
getting this error consistently.

Looks like the issue has to do with statically linking to cuda/npp runtime.
Once I switched to dynamically linking the issues went away.