Hi,
I have a problem with cudaMalloc in my kernel. I have the following piece of code:-
main() {
…
for(i=0; i<TIMES; i++) {
kernel_wrapper();
}
…
…
}
// End main
// Start kernerl wrapper.
kernel_wrapper() {
…
…
// par_output is a N+1 * N+1 (host) array which is initialized.
CUT_SAFE_CALL(cudaMalloc((void**)&output, sizeof(float)(N+1)(N+1)));
CUT_SAFE_CALL(cudaMalloc((void**)&op1, sizeof(float)(N+1)(N+1)));
// ** SEGFAULTING HERE. BECAUSE THE MALLOC DID NOT SUCCEED I BELIEVE.
CUT_SAFE_CALL(cudaMemcpy(output, par_output,sizeof(float)(N+1)(N+1),cudaMemcpyHostToDevice)
);
CUT_SAFE_CALL(cudaMemcpy(op1, par_output,sizeof(float)(N+1)(N+1),cudaMemcpyHostToDevice)
);
for(int k=0;k<M;k++) {
actualKernel<<< grid, threads >>>(N, M, output, op1, h, blx, bly, TBx, TBy);
// I am passing output and op1 which are the N+1 * N+1 arrays to this kernel for some computation.
cudaMemcpy(op1, output,sizeof(float)(N+1)(N+1),cudaMemcpyDeviceToDevice);
}
…
CUT_SAFE_CALL(cudaFree(output));
CUT_SAFE_CALL(cudaFree(op1));
}
}
Note that I am calling kernel_wrapper multiple times. There is no problem at all when the kernel_wrapper executes for the first two time. But when it executes 3rd time, I get a segfault at ** (please see code comment above). Some other points to note are:
- (Both) The cudaMalloc returns 0 for first 2 executions of kernel_wrapper. For the third execution it returns a value of 4. What does this mean?
- This problem happens only for values for N >= 256 (The previous value 128 and lesser values work fine for any number of calls to kernerl_Wrapper)
- If I comment out the call to actualKernel (the kernel itself), there is no segfaulting.
- I have not posted the entire code because it is not tidy and is tough to understand.
- I am using CUDA 2.0 on a linux machine.
Any help will be appreciated. Thanks!