Strange ptx error on compile fatal error in ptx: Arguments mismatch for instruction 'shl'

Hi,
I have no idea what to do with this error, any suggestion would be very appreciated.
I’m running windows xp 32 bits

1>------ Rebuild All started: Project: cppIntegration, Configuration: Debug Win32 ------
1>Deleting intermediate and output files for project ‘cppIntegration’, configuration ‘Debug|Win32’
1>Compiling with CUDA Build Rule…
1>“C:\CUDA\bin\nvcc.exe” -arch sm_10 -ccbin “C:\Program Files\Microsoft Visual Studio 8\VC\bin” -Xcompiler “/EHsc /W3 /nologo /Od /Zi /MTd " -I"C:\CUDA\include” -I"…/…/common/inc" -maxrregcount=32 --compile -o “Debug\cppIntegration.cu.obj” “c:\Documents and Settings\All Users\Application Data\NVIDIA Corporation\NVIDIA GPU Computing SDK\C\src\aes one\cppIntegration.cu”
1>cppIntegration.cu
1>tmpxft_00000c58_00000000-3_cppIntegration.cudafe1.gpu
1>tmpxft_00000c58_00000000-8_cppIntegration.cudafe2.gpu
1>ptxas C:\DOCUME~1\RESEAR~2\LOCALS~1\Temp/tmpxft_00000c58_00000000-4_cppIntegration.ptx, line 312; error : Arguments mismatch for instruction ‘shl’
1>ptxas C:\DOCUME~1\RESEAR~2\LOCALS~1\Temp/tmpxft_00000c58_00000000-4_cppIntegration.ptx, line 313; error : Arguments mismatch for instruction ‘shr’
1>ptxas fatal : Ptx assembly aborted due to errors
1>Compiling…
1>cppIntegration_gold.cpp
1>main.cpp
1>Generating Code…
1>Linking…
1>LINK : fatal error LNK1181: cannot open input file ‘.\Debug\cppIntegration.cu.obj’
1>Build log was saved at “file://c:\Documents and Settings\All Users\Application Data\NVIDIA Corporation\NVIDIA GPU Computing SDK\C\src\aes one\Debug\BuildLog.htm”
1>cppIntegration - 1 error(s), 0 warning(s)
========== Rebuild All: 0 succeeded, 1 failed, 0 skipped ==========

global void criptare_aesctr(unsigned char* TX,unsigned char* IV,unsigned char* RK,unsigned char* SB,unsigned int maxim)
{

 unsigned int i = (blockIdx.y*gridDim.x+blockIdx.x)*blockDim.x+threadIdx.x;
 int j,r,ix;
 unsigned int cx = i*16;
 unsigned char temp[4];
 unsigned char state[16];

.....

TX[cx+ix]=state[ix];    //this row generates the error

}

Since this does not seem to involve handcoded PTX code, it certainly looks like a bug in the compiler, which appears to generate an invalid PTX instruction. If this happens with the CUDA 4.2 toolchain, please file a bug, attaching a self-contained repro case. A link to the reporting bug reporting form can be found on the registered developer website. If you are using an older CUDA version, I would suggest upgrading to 4.2. You may also want to check out the CUDA 5.0 preview (fair warning: alpha release quality software) if you are blocked with 4.2.

Thank you for your help, and sorry for the inconvenience.

It seems like the compiler won’t let me copy the local unsigned char array to the global TX memory, but if i put to state[16], which is out of bounds it’s ok for him… no error.
How can i copy the state to TX please…

global void criptare_aesctr(unsigned char* TX,unsigned char* IV,unsigned char* RK,unsigned char* SB,unsigned int maxim)
{

 unsigned int i = (blockIdx.y*gridDim.x+blockIdx.x)*blockDim.x+threadIdx.x;
 int j,r,ix;
 unsigned int cx = i*16;
 unsigned char temp[4];
 unsigned char state[16];


				 for(ix=0;ix<16;ix++){
					 state[ix]=state[ix]^TX[cx+ix];
					 //TX[cx+ix]=state[ix]^TX[cx+ix]; error if uncomment
					 TX[cx+ix]=state[16];
					 //TX[cx+ix]=state[ix]; error if uncomment
				 }
 }

}

Please file a bug against the compiler. The compiler team may be able to suggest a workaround once they have determined at the root cause for the issue you observe.