Hi,
I am new to CUDA. I have a matlab-cuda application written using cudatoolkit 3.2 and I am trying to run it on a machine with toolkit 4.0. when the code tries to create a mex file I get the following error:
nvcc error : ‘ptxas’ died due to signal 11 (Invalid memory reference)
nvcc error : ‘ptxas’ core dumped
I used --ptxas-options=-v with nvcc and these are the results:
ptxas info : Compiling entry function ‘_Z18_kernel_scale_dfltjjPKfS0_9_OutTablejPK7ushort8jjjPK10_LayerData’ for ‘sm_10’
ptxas info : Used 25 registers, 256+16 bytes smem, 65536 bytes cmem[0], 40 bytes cmem[1]
ptxas info : Compiling entry function ‘_Z23_kernel_ssGRBFNorm_dfltjjPKfS0_9_OutTablejPK7ushort8jjjPK10_LayerData’ for ‘sm_10’
ptxas info : Used 23 registers, 256+16 bytes smem, 65536 bytes cmem[0], 28 bytes cmem[1]
ptxas info : Compiling entry function ‘_Z19_kernel_ssGRBF_dfltjjPKfS0_9_OutTablejPK7ushort8jjjPK10_LayerData’ for ‘sm_10’
ptxas info : Used 22 registers, 256+16 bytes smem, 65536 bytes cmem[0], 24 bytes cmem[1]
ptxas info : Compiling entry function ‘_Z18_kernel_sdNDP_dfltjjPKfS0_9_OutTablejPK7ushort8jjjPK10_LayerData’ for ‘sm_10’
nvcc error : ‘ptxas’ died due to signal 11 (Invalid memory reference)
nvcc error : ‘ptxas’ core dumped
CUDA preprocessing [nvcc] failed
I used Geforce GTX 590 and cuda toolkit4.0
The same code runs correctly on Tesla with cuda toolkit 3.2 ----
ptxas info : Compiling entry function ‘_Z18_kernel_scale_dfltjjPKfS0_9_OutTablejPK7ushort8jjjPK10_LayerData’ for ‘sm_10’
ptxas info : Used 25 registers, 256+16 bytes smem, 65536 bytes cmem[0], 40 bytes cmem[1]
ptxas info : Compiling entry function ‘_Z23_kernel_ssGRBFNorm_dfltjjPKfS0_9_OutTablejPK7ushort8jjjPK10_LayerData’ for ‘sm_10’
ptxas info : Used 25 registers, 256+16 bytes smem, 65536 bytes cmem[0], 28 bytes cmem[1]
ptxas info : Compiling entry function ‘_Z19_kernel_ssGRBF_dfltjjPKfS0_9_OutTablejPK7ushort8jjjPK10_LayerData’ for ‘sm_10’
ptxas info : Used 25 registers, 256+16 bytes smem, 65536 bytes cmem[0], 24 bytes cmem[1]
ptxas info : Compiling entry function ‘_Z18_kernel_sdNDP_dfltjjPKfS0_9_OutTablejPK7ushort8jjjPK10_LayerData’ for ‘sm_10’
ptxas info : Used 39 registers, 256+16 bytes smem, 65536 bytes cmem[0], 20 bytes cmem[1]
ptxas info : Compiling entry function ‘_Z19_kernel_sdGRBF_dfltjjPKfS0_9_OutTablejPK7ushort8jjjPK10_LayerData’ for ‘sm_10’
ptxas info : Used 37 registers, 256+16 bytes smem, 65536 bytes cmem[0], 20 bytes cmem[1]
ptxas info : Compiling entry function ‘_Z19_kernel_sdConv_dfltjjPKfS0_9_OutTablejPK7ushort8jjjPK10_LayerData’ for ‘sm_10’
ptxas info : Used 38 registers, 256+16 bytes smem, 65536 bytes cmem[0], 20 bytes cmem[1]
ptxas info : Compiling entry function ‘_Z17_kernel_nLen_dfltjjPKfS0_9_OutTablejPK7ushort8jjjPK10_LayerData’ for ‘sm_10’
ptxas info : Used 21 registers, 256+16 bytes smem, 65536 bytes cmem[0], 20 bytes cmem[1]
ptxas info : Compiling entry function ‘_Z19_kernel_inhib2_dfltjjPKfS0_9_OutTablejPK7ushort8jjjPK10_LayerData’ for ‘sm_10’
ptxas info : Used 13 registers, 256+16 bytes smem, 65536 bytes cmem[0], 8 bytes cmem[1]
ptxas info : Compiling entry function ‘_Z19_kernel_inhib1_dfltjjPKfS0_9_OutTablejPK7ushort8jjjPK10_LayerData’ for ‘sm_10’
ptxas info : Used 23 registers, 256+16 bytes smem, 65536 bytes cmem[0], 16 bytes cmem[1]
ptxas info : Compiling entry function ‘_Z17_kernel_gMax_dfltjjPKfS0_9_OutTablejPK7ushort8jjjPK10_LayerData’ for ‘sm_10’
ptxas info : Used 21 registers, 256+16 bytes smem, 65536 bytes cmem[0], 20 bytes cmem[1]
ptxas info : Compiling entry function ‘_Z17_kernel_cMax_dfltjjPKfS0_9_OutTablejPK7ushort8jjjPK10_LayerData’ for ‘sm_10’
ptxas info : Used 21 registers, 256+16 bytes smem, 65536 bytes cmem[0], 20 bytes cmem[1]
ptxas info : Compiling entry function ‘_Z17_kernel_cAvg_dfltjjPKfS0_9_OutTablejPK7ushort8jjjPK10_LayerData’ for ‘sm_10’
ptxas info : Used 22 registers, 256+16 bytes smem, 65536 bytes cmem[0], 28 bytes cmem[1]
CUDA preprocessing successful
Can anyone please help me with this error ? Thank you !