If this topic was answered before, my apologies, but the search functions doesn’t work for me.
My question it’s about how to launch kernel(s) to process data length different from power of two:
I know 2 approaches.
One is to launch first the closest power of two with one kernel and then launch a second one with the rest of the data, that avoids to launch iddle threads.
Second is just launch one kernel that calculates thread id (aka tid) and before any code I must put a IF sentence:
if ( tid < Total_length )
the first approach never call a iddle thread but I require to make 2 kernel launch and the second kernel must be arranged to access the rest of the data (offset). The second approach avoids second kernels and any offset calculation but it calls iddle threads in the last block.
I just looking for nice method and efficient. What do you guys ussually do to overcome this situation?
I change one of my kernels using this approach, but the resting threads makes the program to crash… I’m trying to find it out… the error msg is:
Microsoft C++ exception: cudaError at memory location 0x0003c928…
Microsoft C++ exception: cudaError at memory location 0x0003c924…
It’s strange because the remaining threads do nothing but declare variables at startup, I run emudebug and it looks like the whole non-iddle threads runs good but when I step over the first iddle thread the program crash inmediately. What can I do???
I change one of my kernels using this approach, but the resting threads makes the program to crash… I’m trying to find it out… the error msg is:
Microsoft C++ exception: cudaError at memory location 0x0003c928…
Microsoft C++ exception: cudaError at memory location 0x0003c924…
It’s strange because the remaining threads do nothing but declare variables at startup, I run emudebug and it looks like the whole non-iddle threads runs good but when I step over the first iddle thread the program crash inmediately. What can I do???