cuMemcpyDtoH freeze

Hi all,

I have problem with downloading data from device memory when i am using shared memory in calculation…

cuFuncSetBlockShape(Function, maxThreadsUsed, 1, 1);
cuLaunch(Function);
cuMemcpyDtoH((void*)Result, DevicePtr, memSize);

This works runs OK, data are downloaded from device mem and main program thread will continue,
(but calculated data are not correct, need use data from shared memory, …)

cuFuncSetSharedSize(Function, sharedSize);
cuFuncSetBlockShape(Function, maxThreadsUsed, 1, 1);
cuLaunch(Function);
cuMemcpyDtoH((void*)Result, DevicePtr, memSize); // <- this cause program freeze when cuFuncSetSharedSize is called before…

Any ides what should cause cuMemcpyDtoH freeze ???

first check that the kernel finished, you probebly have a crash already in there…

Each example cuLaunch(Function) call returns CUDA_SUCCESS, like:

if( (cuRes = cuLaunch(Function)) != CUDA_SUCCESS) return Fail(“Failed launch”, cuRes);

Each cu… method is calling like that and no one fails, only that cuMemcpyDtoH freeze when cuFuncSetSharedSize is called before…

Main calculation algorithm is like:

1 create context

2 load module

3 store data1 [shared data]

4 store data2 [main data]

5 load func1 -> setup args func1 and launch (func1 don’t use shared data)

6 load func2 -> setup args func2 and launch (func2 uses shared data)

7 load data2 <-- freeze

When i am not using cuFuncSetSharedSize, load data [7] not freezes…

When main calculation algorithm is like this everithing works perfect:

1 create context

2 load module

3 store data2 [main data]

4 load func1 -> setup args func1 and launch (func1 don’t use shared data)

5 load and free data2

6 detach context

7 create context

8 load module

9 store data1 [shared data]

10 store data2 [main data]

11 load func2 -> setup args func2[cuFuncSetSharedSize] and launch

12 load and free data2

13 free data1

14 detach context

You were right…

I am developing on Linux platform, and there is a problem with cuLaunch call: it always returns CUDA_SUCCESS !! ??

How can i determine what error causes by cuLaunch ??

Any other cu… method call after cuLaunch will cause freeze :(

The problem must be somewhere in kernel functions about shared memory calls…

Error checking cuLaunch only tells you that the kernel launched, not that it completed. It is likely that the kernel either crashed after launch and hosed your context, or that it is stuck in a loop or otherwise hung, which effectively blocks the memcpy call. Try adding a cuCtxSynchronize call after the launch and query what it returns.