Hi guys,
So I am having an issue with the order in which I malloc my variables and being able to access them. Specifically for a code like:
[codebox] cudaMalloc((void**)&kxihlG,nElkxihl*sizeof(float));
cudaMalloc((void**)&kxihrG,nElkxihr*sizeof(float));
cudaMemcpy(kxihlG, kxihl, nElkxihl*sizeof(float), cudaMemcpyHostToDevice);
cudaMemcpy(kxihrG, kxihr, nElkxihr*sizeof(float), cudaMemcpyHostToDevice);
.
.
. lots of variables
cudaMalloc((void**)&TARGETVAR,nElTARGETVAR*sizeof(float));
cudaMemcpy(TARGETVARG, TARGETVAR, nElTARGETVAR*sizeof(float), cudaMemcpyHostToDevice);
test<<<1,1>>>(TARGETVARG);
Error = cudaThreadSynchronize();
fprintf(stderr,"@TEST1 Error = %d \n",Error);[/codebox]
if I run it like this cudaThreadSynchronize returns a failure of ‘4’, but if move the cudaMalloc for TARGETVAR to the top of the list then the test<<<>>> kernel runs succesfuly:
THIS WORKS
[codebox]
cudaMalloc((void**)&TARGETVAR,nElTARGETVAR*sizeof(float));
cudaMalloc((void**)&kxihlG,nElkxihl*sizeof(float));
cudaMalloc((void**)&kxihrG,nElkxihr*sizeof(float));
cudaMemcpy(kxihlG, kxihl, nElkxihl*sizeof(float), cudaMemcpyHostToDevice);
cudaMemcpy(kxihrG, kxihr, nElkxihr*sizeof(float), cudaMemcpyHostToDevice);
.
.
. lots of variables
cudaMemcpy(TARGETVARG, TARGETVAR, nElTARGETVAR*sizeof(float), cudaMemcpyHostToDevice);
test<<<1,1>>>(TARGETVARG);
Error = cudaThreadSynchronize();
fprintf(stderr,"@TEST1 Error = %d \n",Error);[/codebox]
Can anyone explain this? I dont think the machine is full? Do I have to put some delay in? why does the order matter as long as malloc for a given variable is before the memcpy for it?
How can this be fixed because this problem is occuring elsewhere with other variable? Thank you for your time!