OK so I’ve got a cuda program that’s resampling some input data. You give it a a ratio U/D and it resamples the input data for you.
I noticed when I changed the code that calculates my output buffer size from:
out_buff_size = (int)round(res_ratio * (float)PTS_PER_ITER+.5);
to
out_buff_size = (int)round(res_ratio * (float)PTS_PER_ITER);
I started getting unspecified launch errors. (Note that the above changed the size of my output buffer from 180225 to 180224 elements, no biggie).
A little investigating lead me back to some buffers that didn’t seem to be allocated correctly, so I changed these two lines:
CUDA_SAFE_CALL(cudaMalloc((void**)&dev_out_buff_si, sizeof(short) * out_buff_size));
CUDA_SAFE_CALL(cudaMalloc((void**)&dev_out_buff_sf, sizeof(float) * out_buff_size));
to
CUDA_SAFE_CALL(cudaMalloc((void**)&dev_out_buff_si, sizeof(short) * (out_buff_size+1)));
CUDA_SAFE_CALL(cudaMalloc((void**)&dev_out_buff_sf, sizeof(float) * (out_buff_size+1)));
And the error resolved itself (note that the CUDA_SAFE_CALL never reported an error from cudaMalloc).
Is this a bug with cudaMalloc where it doesn’t properly allocate buffers of certain sizes? I’m using SDK 2.0 on Linux. Note also that I tried cutting my input buffer size in half, and still got the same behavior…