Hello All:
I have an application that is multi-threaded. I have two files: one is a .cpp and the other is a .cu. In the .cpp I am doing Window Api programing (making windows, dialogs, controls, etc.). I have multiple threads working in the .cpp file and in each thread routine(I create my threads using CreateThread(…)) I am creating three variables: one host array and two Cuda arrays. I am using malloc to allocate memory for the host array and cudaMalloc for the Cuda arrays. This is all being done in the thread routine. I am checking the status of the mallocs and cudamalloc()s and they are both returning Success. When I check the value of these variables with the Visual Studios Debugger the CudaArrays are indicated to have values. I am then trying to transfer data to one of the CudaArrays using cudamemcpy(). This function also returns success. I then try to pass the Cudaarrays to a C++ wrapper function in my .cu file and the values of the pointers are 0x00000 and therefore useless. What am I missing here? I am using VS 2008, Win 7, 64bit. The code is below. Any advise would be greatly appreciated by this Newbie.
Thread Routine in my .cpp file:
DWORD WINAPI CapturedFieldServiceThread(PVOID id)
{
unsigned short * CUDA_Image_Input, * CUDA_Image_Output, * hostBuffer; //my arrays
int error; //error code
cudaError_t stat; //cuda error
int size = sizeof(unsigned short)* pxd_imageXdims((UINT)id + 1) * pxd_imageYdims((UINT)id + 1);
//malloc memory for host array
hostBuffer = (unsigned short *)malloc(size);
//allocate memory for device arrays
stat = cudaMalloc (&CUDA_Image_Input, size);
stat = cudaGetLastError();
if(stat != cudaSuccess)
MessageBox(NULL,"couldnt malloc to cuda","CudaMalloc", MB_OK|MB_TASKMODAL);
stat = cudaMalloc (&CUDA_Image_Output,size);
stat = cudaGetLastError();
if(stat != cudaSuccess)
MessageBox(NULL,"couldnt malloc to cuda","CudaMalloc", MB_OK|MB_TASKMODAL);
for (;<img src='http://forums.nvidia.com/public/style_emoticons/<#EMO_DIR#>/wink.gif' class='bbc_emoticon' alt=';)' />
{
//
// Wait for signal.
//
WaitForSingleObject((HANDLE)Window_Process[(UINT)id].evt, INFINITE);
......(Do some stuff).....
//copy host information to device
stat = cudaMemcpy((unsigned short *)CUDA_Image_Input,
(unsigned short *)hostBuffer,
size,
cudaMemcpyHostToDevice);
//call C++ wrapper in .cu file
if(Window_Process[(UINT)id].vert)
flipVertically(
CUDA_Image_Input,
CUDA_Image_Output,
Window_Process[(UINT)id].imageX,
Window_Process[(UINT)id].imageY);
stat = cudaMemcpy((unsigned short *)hostBuffer,
(unsigned short *)CUDA_Image_Output,
size,
cudaMemcpyDeviceToHost);
.......(do some stuff).....
}
C++ Wrapper function in .cu file
/*
*
*Wrapper function that calls the GPU kernel to flip a image vertically
*
*/
void flipVertically(unsigned short * CUDAInput,
unsigned short * CUDAOutput,
int cols,
int rows)
{
int cuda_Cols = cols + ((cols + 32)%32);
//call the CUDA kernel to flip the image
flipVert <<<rows ,cuda_Cols >>> ((unsigned short *)CUDAInput, (unsigned short *)CUDAOutput,rows,cols);
}