Hi. I am working on a file converter program that will use CUDA to accelerate the process by using one thread per file. Currently, the program works fine in device emulation mode. However, when I compile without emulation mode, it runs fine and doesn’t crash but the image generated by the conversion is not right. I checked the output in VS2008, and found:
I’m immediately confused. Don’t C++ exceptions if not caught terminate the program? Ignoring this, In every one of my cuda function calls, I check the return value to make sure there is no error. And from these checked functions none of them printed out they failed. I choose to break on C++ exceptions, and found that the first exception occurred at cudaMalloc():
// The unfiltered file contents of each hnd file
void* d_hnds;
uint* d_hndsOffsets;
uint* hndsSizes;
if (cudaMalloc((void**)&d_hndsOffsets, sizeof(uint) * numFiles))
{
printf("cudaMalloc() failed\n");
return;
}
However, even though there was a C++ exception the return value of cudaMalloc was 0 (success). I wanted to get the cudaError from the exception, so I tried to catch it; but I couldn’t. I moved on to the second exception, and found it occurred at a kernel call:
The two most common reasons (that I’ve seen, anyway) for getting errors when you move from emulation to actually running code on the device are: (1) accidentally passing a host pointer to a device method, or vice versa; and (2) race conditions in your kernel.
It usually helps if you post a bit more code…but perhaps you meant to cudaMalloc the d_hnds pointer, instead of d_hndsOffsets? If not, attach some more code to your post and maybe someone can help you find the problem.
I did mean d_hndsOffsets. The program works like this: I read in all of the hnd files to convert (to raw), and store their contents in one big block of device memory. I pass the pointer to this memory to the kernel, along with a block of device memory holding the offsets of the different hnd files (not every hnd file’s size is the same). Previously I had been using a pointer-to-pointer scheme referencing each hnd file, but I thought it could be the reason for my problems so I removed it. So far, I’ve re-created the project file using the wizard at http://sourceforge.net/projects/cudavswizard/ (CUDA_VS_Wizard_W32.2.0.zip) and it still doesn’t work. I’ve also made sure I don’t do any dereferencing of device pointers in host code, or pointer arithmetic of device pointers in host code. I no longer have any C++ exceptions at all, and the errors have disappeared (including the “Invalid Device Function”), but the image result is still incorrect (and the result produced by emulation mode still works). I will attach the entire source file here to see if anyone can help me. Any help is greatly appreciated. hnd_to_raw_cuda.cu.txt (8.57 KB)
Here is some information on my setup which I did not include before:
OS: Windows XP SP2
CPU: Intel Core 2 Duo E6550 @ 2.33GHz
GPU: GeForce 9500 GT
RAM: 3.25 GB
CUDA Version: 2.3
Compiler for Host code: Microsoft Visual C++ 2008 Express Edition