I have been trying to create use CUDA from a C++ host program but I have a few problems and was hoping someone could help me out a bit.
What I’m trying to do is pass a float array of a certain size from the C++ source to the function in the .cu file and finally to the CUDA kernel.
I have managed to integrate it correctly with C++ so everything works fine.
The problem appears when I increase the size of this array to more than 1000 or so. It compiles fine but it crushes when I run the console application.
What I want to pass ultimately is an audio input array as I want to write an audio plugin that will be using CUDA for the DSP calculations. Hopefully. So the array will probly be quite large.
Here is some code…
This is were I set the blocks and threads per block. Although I haven’t yet figured out how it works exactly I believe that if I want to allocate memory for an array of size 10000 for example I can do the following:
[codebox]const unsigned int numBlocks = 100;
const unsigned int numThreadsPerBlock = 100;
cutilCondition(0 == (len%4));
const unsigned int mem_size = sizeof(float) * numBlocks * numThreadsPerBlock;[/codebox]
So in the kernel I retrieve the current thread ID by doing this:
[codebox]int ID = blockIdx.x * blockDim.x + threadIdx.x;
//Or, by doing this(according to what Ive seen):
int ID = threadIdx.x * threadIdx.y;[/codebox]
I do not understand why its not working. And also what is this for:
[codebox]cutilCondition(0 == (len % 4));[/codebox]
What does it define? Why is the number 4 there?
Its crushing whenever I pass a large size array from the C++ code to CUDA. Sometimes its just giving me random results for the processed array which makes me think it might be syncronization problem but I honestly don’t know.
If anyone knows what I m not getting right here please…help!