Parameters gets broken when passing to __host__ function in .cu

Compiler: MSVC 2008

CUDA SDK: 2.2

Hi there, I am still struggling with the SDK upgrade… :-(

Currently, I have all parameters screwed up after I call any function in .cu

Here is code call:

init_md5_cuda(&(data->gpu), g->hash_i, perm::charset, perm::charset_len);

000000014000D305  mov		 rdx,qword ptr [g (140122D10h)] 

000000014000D30C  add		 rdx,0D180h 

000000014000D313  mov		 rcx,qword ptr [data] 

000000014000D31B  add		 rcx,0A0h 

000000014000D322  mov		 r9d,dword ptr [perm::charset_len (1401227BCh)] 

000000014000D329  lea		 r8,[perm::charset (140122800h)] 

000000014000D330  call		init_md5_cuda (14002E210h)

Here is how parameters are decoded:

000000014002E210  mov		 qword ptr [rsp+8],rbx 

000000014002E215  mov		 qword ptr [rsp+10h],rsi 

000000014002E21A  push		rdi  

000000014002E21B  sub		 rsp,30h 

000000014002E21F  mov		 edi,r9d 

	cudaMemcpy("target_hash", hash_i, sizeof(int)*4, cudaMemcpyHostToDevice);

000000014002E222  mov		 r9d,1 

000000014002E228  mov		 rsi,r8 

000000014002E22B  mov		 rbx,rcx 

000000014002E22E  lea		 rcx,[string "target_hash" (1400E36A8h)] 

000000014002E235  lea		 r8d,[r9+0Fh] 

000000014002E239  call		cudaMemcpy (14002E63Ch)

init_md5_cuda is extern “C”.

It passes values via registers, and tries to fetch them from stack :-S Not sure why :-S

It’s extern “C” - it should be all stack :-S

Same with 2.3

Not sure I understand your problem.

extern “C” doesn’t say anything about the way of passing parameters (it is architecture-dependent, OS-dependant, compiler-dependent and compiler-flags-dependent). It just means that the function is to be named according to C rules, not C++. So no name mangling, and no overload and template support.

Your code looks correct and in accordance with VS2008 calling convention for x64:

http://msdn.microsoft.com/en-us/library/zthk2dkh.aspx

Maybe you should check the compiler options that were used for both your .cu file and .cpp file and make sure they are compatible…

I’ve did a workaround by specifying __fastcall in .cu and in the headers. :-S