hey, I am facing weird problem. In my code, if I add some additional printf statements then it shows correct result…and if I remove those unnecessary printf statements then answer is wrong. What’s happening???
Please reply antone if facing same problem and have solution…
I tried cudaThreadSynchronize()… I tried every possible combination of cudaThreadSynchronize() with and without printf()…But it only works in given form in file…That’s ridiculous and makes me crazy…and if I go for further processing which is required for completing algo, this results get modified…
This is swirling my brain…the code which is going to execute in future affects the present execution,that’s awe full.
Please tell me the proper procedure to call and execute any CUDA kernel completely with all required synchronization statements and other precautions…!!!
Hey everyone…I found the bug…It is use of array in structure…:)
CUDA supports structure but it creates problem while using array inside structure. I knew pointer also creates problem but now I found the use of array is also risky business. So, always use array separate from structure…
Hey everyone…I found the bug…It is use of array in structure…:)
CUDA supports structure but it creates problem while using array inside structure. I knew pointer also creates problem but now I found the use of array is also risky business. So, always use array separate from structure…
int main()
{
COMPLEX *DATA_IN; //i want 2048 DATA_IN in 1D array on GPU
cudaMalloc((void **)&DATA_IN, 2048 * sizeof(COMPLEX)); //allocate mem on GPU global mem
.......
return something;
}
Well, i used to see on some CUDA SDK samples which used cuComplex.h header, they use those method. So allocate all the data in a single cudaMalloc().